MS Bioinformatics · UC San Diego · San Francisco / San Diego, CA
Work
Projects
Computational tools and research projects from the lab.
Featured
Evo2 Fine-Tuning
40B-parameter genomic foundation model on long-read gut metagenomics
Leading fine-tuning of Evo2, a 40B-parameter genomic foundation model, on PacBio HiFi long-read gut metagenomic data for strain-level functional prediction and HGT network reconstruction. Ported BioNeMo training to AMD ROCm on an MI300A HPC cluster. Master's thesis project.
11.7% IBD effect-size increase. SD reduced from 0.30% to 0.02%.
Case-control matching tool for microbiome studies. Built with Scikit-Bio and QIIME 2. Demonstrated 11.7% IBD effect size increase (R² 1.38 to 1.54) in the American Gut Project and 13.5% improvement on HMP2 with dramatically reduced variance.
96-taxon IBD signature identified from 3,000+ input features
QIIME2 plugin providing VAE-based mechanistic interpretability for metagenomic and transcriptomic data. Identified a sparse signature of 96 taxa characteristic of IBD, enabling biologically interpretable dimensionality reduction from over 3,000 input features.
Nextflow pipeline processing 2,000+ cancer samples
Nextflow workflow for microbial characterization from cancer sequencing data. Analyzed 2,000+ samples across colorectal, esophageal squamous cell carcinoma, and other cancer types. Integrated human read filtration with taxonomic profiling via KrakenUniq and MetaPhlAn4.
Machine learning model detecting Homologous Recombination Deficiency from RNA-seq data in breast and ovarian cancer. Uses autoencoders for mechanistic interpretability to identify genes from an RNA-seq panel associated with positive patient survival outcomes. Presented at AACR 2026 and BMES 2025.
Knowledge graph RAG system for microbiome literature
Microbiome-specific knowledge graph RAG system. Extracts entities (microbes, genes, metabolites, diseases) and relationships from research literature, enabling semantic querying and citation-grounded answers for metagenomic study design.
Interactive Differential Expression Analysis. Python package for differential expression analysis on gene expression data, designed as a Python equivalent to DESeq2. Supports standard RNA-seq workflows with visualization and statistical testing.
Particle Swarm Optimization for warehouse placement
Warehouse location optimizer using Particle Swarm Optimization. Finds optimal placement for a set of warehouses given stores and residential areas, balancing minimum distance from residential zones against maximum distance to stores.
I'm an MS Bioinformatics student at UC San Diego. I build computational tools that turn raw sequencing data into biological insight.
At the Knight Lab I develop deep learning methods to unlock new metagenomic analyses, with a focus on ASD and IBD. At the Alexandrov Lab I work on cancer genomics, developing methods for characterizing tumor biology from sequencing data.
My focus is on designing novel deep learning architectures that make previously intractable biological questions answerable.