
A machine learning (ML) researcher focused on clinical applications; 6+ years of research & industry experience.
Currently performing research in
WangLab
whilst completing a PhD at the University of Toronto.
Research Experience
ECG Foundation Model
Using SOTA contrastive pretraining to generate a foundation model for ECG diagnosis using 1.7 million ECGs (800k being from UHN's internal dataset).
Technologies
- PyTorch
- BERT
- Contrastive pretraining
SMaRTBRAIN - Dementia Classification
Using multimodal and longitudinal techniques for neurodegenerative disease diagnosis & prognosis; Using 3D brain perfusion scintigraphy & MRI imaging data from Sunnybrook.
Technologies
- PyTorch
- MONAI
- Hugging Face
Deformable Image Registration
Developed an unsupervised 3D U-Net model variant for deformable, pairwise image registration of whole-brain MRI data under Dr. Maged Goubran & Ahmadreza Attarpour.
Technologies
- PyTorch
- MONAI
- TensorFlow
- VoxelMorph
- Compute Canada
- Neurite
- ITK-SNAP
UofT AI Director of Education
Led LearnAI, a free course teaching ML to undergrads, as the club Director of Education.
Acted as curriculum development lead, administrative coordinator, lecturer, TA, and project mentor.
Partnered with AI Commons to offer the course to people in emerging countries.
Work Experience
Machine Learning Researcher
Performed machine learning for health research at U of T's WangLab.
Performed multi-source domain adaptation & generalization for MRI, with test-time input adaptation (conditional image generation) using content-style disentanglement.
Developed a subject-adaptive framework for fMRI-based visual stimuli reconstruction using Versatile Diffusion, a multimodal latent diffusion model, fostering subject-invariant latent spaces using a subject classifier, Gradient Reversal Layer, and ArcFace loss.
Began the SMaRTBRAIN & ECG foundation model projects, carrying into the PhD.
Topics
- Domain adaptation
- Content-style disentanglement
- Loss-conditioned DDPM
- Visual stimuli reconstruction
- Representation learning
- Latent diffusion model
Applied Machine Learning Intern
Core developer on CyclOps,
an open-source evaluation framework for ML research and deployment in healthcare, as a member of Vector’s AI engineering team.
Researched two risk-prediction use cases utilizing CyclOps: Leading a Vector-WangLab research collaboration
predicting cardiovascular risk and all-cause mortality on the
GEMINI dataset; Contributing to internal Vector
research focusing on all-cause mortality and length of stay in ER on the
MIMIC-IV dataset.
Developed generalizable, high-level querying and processor APIs, consisting of general task-based pipelines to
query, clean, featurize, normalize, aggregate, impute, and vectorize tabular and temporal clinical data.
Technologies
- PyTorch
- scikit-learn
- MLflow
- NumPy
- Pandas
- SQLAlchemy
- SLURM
Data Scientist
Developed a statistical methods & analysis package, consisting of end-to-end workflows to perform climate timeseries forecasting, based on modern statistical research.
Workflows
- EDA, reporting, and cleaning of spatio-temporal timeseries data
- Forecasting methods using climate simulations & observed re-analysis data
- Statistical analysis of temperature & precipitation data
Technologies
- SciPy
- NumPy
- Pandas
- Matplotlib
- xarray
- GCP
- BigQuery
- Jupyter Notebook
Machine Learning Engineer
Developed computer vision pipelines to help gym users improve their workout form:
- Use deep-learning pose estimation to predict an exercise being performed
- Detect and classify repetitive motion as repetitions of an exercise
- Track multiple users using facial recognition and efficient, real-time object tracking
Education
-
PhD, Laboratory Medicine & PathobiologyUniversity of TorontoSep 2023 - Present
-
HBSc, Computer ScienceGraduated with High DistinctionUniversity of TorontoSep 2018 - Apr 2022
Minors in Statistics & the History and Philosophy of Science and Technology
Courses
- NNs & Deep Learning (CSC413)
- Probabilistic ML (CSC412)
- Computer Vision (CSC420)
- NLP (CSC401)
- KR&R (CSC486)
- Intro to ML (CSC311)
- Intro to AI (CSC384)
- Methods of Data Analysis (STA302)
Publications
Title | Authors | Publication | Year | Type |
---|---|---|---|---|
ECG-FM: An Open Electrocardiogram Foundation Model. | Kaden McKeen, Laura Oliva, Sameer Masood, Augustin Toma, Bo Wang. | arXiv | 2024 | Preprint |
CyclOps: Cyclical development towards operationalizing ML models for health. | Amrit Krishnan, Vallijah Subasri, Kaden McKeen, et al. | medrxiv | 2022 | Preprint |
Technical Skills
Languages
- Python
- R
- MATLAB
- SQL
- C#
- HTML
- LaTeX
Python Libraries
- PyTorch
- TensorFlow
- Keras
- transformers
- diffusers
- scikit-learn
- NumPy
- Pandas
- SciPy
- spaCy
- Matplotlib
- Seaborn
- OpenCV
- MONAI
- Pillow
- SQLAlchemy
- pre-commit
- pytest
Models
- LLaMA-2
- BERT
- LDM
- ViT
- Transformer
- U-Net
- ConvNeXt
- CNN
- VAE
- GAN
- XGBoost
"All models are wrong, but some are useful" - George E. P. Box
Conferences & Seminars
-
AI in Healthcare 2023
Vector Institute, UofT AISeminar presentation
-