A machine learning (ML) researcher focused on clinical applications; 6+ years of research & industry experience.
Currently performing research in WangLab whilst completing a PhD at the University of Toronto.

Research Experience

ECG Foundation Model

U of T's WangLab
Aug 2023 - Present

Using SOTA contrastive pretraining to generate a foundation model for ECG diagnosis using 1.7 million ECGs (800k being from UHN's internal dataset).

Technologies

  • PyTorch
  • BERT
  • Contrastive pretraining

SMaRTBRAIN - Dementia Classification

U of T's WangLab
Feb 2024 - Present

Using multimodal and longitudinal techniques for neurodegenerative disease diagnosis & prognosis; Using 3D brain perfusion scintigraphy & MRI imaging data from Sunnybrook.

Technologies

  • PyTorch
  • MONAI
  • Hugging Face

Deformable Image Registration

Sunnybrook's BrainLab
May 2021 - Jan 2022

Developed an unsupervised 3D U-Net model variant for deformable, pairwise image registration of whole-brain MRI data under Dr. Maged Goubran & Ahmadreza Attarpour.

Technologies

  • PyTorch
  • MONAI
  • TensorFlow
  • VoxelMorph
  • Compute Canada
  • Neurite
  • ITK-SNAP

UofT AI Director of Education

UofT AI
Feb 2020 - Apr 2022

Led LearnAI, a free course teaching ML to undergrads, as the club Director of Education.
Acted as curriculum development lead, administrative coordinator, lecturer, TA, and project mentor. Partnered with AI Commons to offer the course to people in emerging countries.

Work Experience

Machine Learning Researcher

University Health Network
Sep 2022 - Aug 2023

Performed machine learning for health research at U of T's WangLab.

Performed multi-source domain adaptation & generalization for MRI, with test-time input adaptation (conditional image generation) using content-style disentanglement.

Developed a subject-adaptive framework for fMRI-based visual stimuli reconstruction using Versatile Diffusion, a multimodal latent diffusion model, fostering subject-invariant latent spaces using a subject classifier, Gradient Reversal Layer, and ArcFace loss.

Began the SMaRTBRAIN & ECG foundation model projects, carrying into the PhD.

Topics

  • Domain adaptation
  • Content-style disentanglement
  • Loss-conditioned DDPM
  • Visual stimuli reconstruction
  • Representation learning
  • Latent diffusion model

Applied Machine Learning Intern

Vector Institute
Jan 2022 - Sep 2022

Core developer on CyclOps, an open-source evaluation framework for ML research and deployment in healthcare, as a member of Vector’s AI engineering team.

Researched two risk-prediction use cases utilizing CyclOps: Leading a Vector-WangLab research collaboration predicting cardiovascular risk and all-cause mortality on the GEMINI dataset; Contributing to internal Vector research focusing on all-cause mortality and length of stay in ER on the MIMIC-IV dataset.

Developed generalizable, high-level querying and processor APIs, consisting of general task-based pipelines to query, clean, featurize, normalize, aggregate, impute, and vectorize tabular and temporal clinical data.

Technologies

  • PyTorch
  • scikit-learn
  • MLflow
  • NumPy
  • Pandas
  • SQLAlchemy
  • SLURM

Data Scientist

Pharos Platforms
Jan 2020 - Dec 2021

Developed a statistical methods & analysis package, consisting of end-to-end workflows to perform climate timeseries forecasting, based on modern statistical research.

Workflows

  • EDA, reporting, and cleaning of spatio-temporal timeseries data
  • Forecasting methods using climate simulations & observed re-analysis data
  • Statistical analysis of temperature & precipitation data

Technologies

  • SciPy
  • NumPy
  • Pandas
  • Matplotlib
  • xarray
  • GCP
  • BigQuery
  • Jupyter Notebook

Machine Learning Engineer

Pre-form Fitness
Sep 2019 - May 2020

Developed computer vision pipelines to help gym users improve their workout form:

  • Use deep-learning pose estimation to predict an exercise being performed
  • Detect and classify repetitive motion as repetitions of an exercise
  • Track multiple users using facial recognition and efficient, real-time object tracking

Education

  • PhD, Laboratory Medicine & Pathobiology
    University of Toronto
    Sep 2023 - Present
  • PhD candidate supervised by Dr. Bo Wang; Performing research focused on innovative machine learning techniques (foundation models, multimodal integration, and longitudinal modeling) for clinical applications relating to physiological signals, medical imaging modalities, and clinical text.
  • HBSc, Computer Science
    Graduated with High Distinction
    University of Toronto
    Sep 2018 - Apr 2022
  • Computer Science Specialist Focusing in Artifical Intelligence
    Minors in Statistics & the History and Philosophy of Science and Technology

Courses

  • NNs & Deep Learning (CSC413)
  • Probabilistic ML (CSC412)
  • Computer Vision (CSC420)
  • NLP (CSC401)
  • KR&R (CSC486)
  • Intro to ML (CSC311)
  • Intro to AI (CSC384)
  • Methods of Data Analysis (STA302)

Publications

    Title Authors Publication Year Type
    ECG-FM: An Open Electrocardiogram Foundation Model. Kaden McKeen, Laura Oliva, Sameer Masood, Augustin Toma, Bo Wang. arXiv 2024 Preprint
    CyclOps: Cyclical development towards operationalizing ML models for health. Amrit Krishnan, Vallijah Subasri, Kaden McKeen, et al. medrxiv 2022 Preprint

Technical Skills

Languages

  • Python
  • R
  • MATLAB
  • SQL
  • C#
  • HTML
  • LaTeX

Python Libraries

  • PyTorch
  • TensorFlow
  • Keras
  • transformers
  • diffusers
  • scikit-learn
  • NumPy
  • Pandas
  • SciPy
  • spaCy
  • Matplotlib
  • Seaborn
  • OpenCV
  • MONAI
  • Pillow
  • SQLAlchemy
  • pre-commit
  • pytest

Models

  • LLaMA-2
  • BERT
  • LDM
  • ViT
  • Transformer
  • U-Net
  • ConvNeXt
  • CNN
  • VAE
  • GAN
  • XGBoost

"All models are wrong, but some are useful" - George E. P. Box

Conferences & Seminars