Hi, I'm Leo Joseph

I build computational tools at the intersection of deep learning and genomics: microbiome analysis, cancer genomics, and multi-omics integration.

When I'm not a student, I like to drink coffee, take photos, and pursue athletic endeavors.

MS Bioinformatics · UC San Diego · San Francisco / San Diego, CA

Leo Joseph
Work

Projects

Computational tools and research projects from the lab.

Featured

Evo2 Fine-Tuning

40B-parameter genomic foundation model on long-read gut metagenomics

Leading fine-tuning of Evo2, a 40B-parameter genomic foundation model, on PacBio HiFi long-read gut metagenomic data for strain-level functional prediction and HGT network reconstruction. Ported BioNeMo training to AMD ROCm on an MI300A HPC cluster. Master's thesis project.

PythonPyTorchEvo2ROCmKubernetesmetagenomics
View on GitHub
Featured

Qupid

11.7% IBD effect-size increase. SD reduced from 0.30% to 0.02%.

Case-control matching tool for microbiome studies. Built with Scikit-Bio and QIIME 2. Demonstrated 11.7% IBD effect size increase (R² 1.38 to 1.54) in the American Gut Project and 13.5% improvement on HMP2 with dramatically reduced variance.

PythonScikit-BioQIIME 2microbiomestatistics
View on GitHub
Featured

q2-mechinterp

96-taxon IBD signature identified from 3,000+ input features

QIIME2 plugin providing VAE-based mechanistic interpretability for metagenomic and transcriptomic data. Identified a sparse signature of 96 taxa characteristic of IBD, enabling biologically interpretable dimensionality reduction from over 3,000 input features.

PythonQIIME2VAEPyTorchinterpretability
View on GitHub
Featured

CMPipeline

Nextflow pipeline processing 2,000+ cancer samples

Nextflow workflow for microbial characterization from cancer sequencing data. Analyzed 2,000+ samples across colorectal, esophageal squamous cell carcinoma, and other cancer types. Integrated human read filtration with taxonomic profiling via KrakenUniq and MetaPhlAn4.

NextflowRmetagenomicscancerKrakenUniq
View on GitHub
Featured

softHRD

AACR 2026 poster. RNA-seq ML for HRD detection.

Machine learning model detecting Homologous Recombination Deficiency from RNA-seq data in breast and ovarian cancer. Uses autoencoders for mechanistic interpretability to identify genes from an RNA-seq panel associated with positive patient survival outcomes. Presented at AACR 2026 and BMES 2025.

PythonRNA-seqCancer BiologyautoencoderML
View on GitHub

knightGPT

Knowledge graph RAG system for microbiome literature

Microbiome-specific knowledge graph RAG system. Extracts entities (microbes, genes, metabolites, diseases) and relationships from research literature, enabling semantic querying and citation-grounded answers for metagenomic study design.

PythonLLMGraph-RAGOllamaNLP
View on GitHub

IDEA

Python equivalent to DESeq2

Interactive Differential Expression Analysis. Python package for differential expression analysis on gene expression data, designed as a Python equivalent to DESeq2. Supports standard RNA-seq workflows with visualization and statistical testing.

PythonDESeq2RNA-seqpackaging
View on GitHub

Hive Mind Optimization

Particle Swarm Optimization for warehouse placement

Warehouse location optimizer using Particle Swarm Optimization. Finds optimal placement for a set of warehouses given stores and residential areas, balancing minimum distance from residential zones against maximum distance to stores.

PythonoptimizationPSOalgorithms
View on GitHub
Toolbox

Technologies

Languages, frameworks, and tools I work with.

Languages

PythonRBashCC++JavaJavaScriptRustSQL

ML / Data Science

PyTorchTensorFlowscikit-learnJax

Bioinformatics

QIIME2BLASTBowtieBWADESeq2GATKSAMtoolsSTARBIRDMAnHMMER

Workflow / Infra

NextflowSnakemakeSLURMDockerKubernetesAWSAzureGit

Visualization

ggplot2matplotlibseabornD3.js

Databases / Web

MongoDBFlaskReactSvelteKit
About Me

Background

I'm an MS Bioinformatics student at UC San Diego. I build computational tools that turn raw sequencing data into biological insight.

At the Knight Lab I develop deep learning methods to unlock new metagenomic analyses, with a focus on ASD and IBD. At the Alexandrov Lab I work on cancer genomics, developing methods for characterizing tumor biology from sequencing data.

My focus is on designing novel deep learning architectures that make previously intractable biological questions answerable.

Education

University of California, San Diego

MS Bioengineering: Bioinformatics

Expected June 2027

University of California, San Diego

BS Bioengineering: Bioinformatics

June 2026

GPA: 3.7 · Citron-Chien Fellow

Research Experience

Knight Lab · Prof. Rob Knight

UC San Diego · Dec 2023 – Present

Microbiome, ASD, IBD

Alexandrov Lab · Prof. Ludmil Alexandrov

UC San Diego · Jun 2024 – Present

Cancer genomics, HRD detection

Bansal Lab · Prof. Vikas Bansal

UC San Diego · Sep 2024 – Jun 2025

CNV detection from WES

Industry Experience

Kaiser Permanente

Business Process Intern · Jun – Sep 2023

Clear Labs

R&D Intern · Jun – Aug 2021

Cisco

Software Engineer Intern · Jun – Aug 2021

Relevant Coursework

  • Design & Analysis of Algorithms
  • Applied Genomic Technologies
  • Advanced Bioinformatics Lab
  • Supervised Machine Learning Algorithms
Say Hello

Get in Touch

Feel free to reach out.

Find me on:

Or email directly:

l1joseph@ucsd.edu

© 2026 Leo Joseph. All rights reserved. · last updated Apr 2026