Projects
Projects
Statistical work
Alongside teaching, my graduate training included substantial applied and methodological work in statistics and machine learning. These projects reflect the depth of training that informs my instruction, particularly in AP Statistics.
2025
RandomForestSpecCheck: A Permutation-Based Random Forest Diagnostic for Linear Mixed Models
My primary independent research: a novel, nonparametric diagnostic I designed and developed for detecting misspecification in linear mixed models (LMMs). The method combines a machine-learning measure of leftover structure with a permutation test that respects clustered data, and is packaged as an R function that returns the observed statistic, the full permutation distribution, and a clear decision.
The problem
Linear mixed models are easy to misspecify, and standard diagnostics such as residual plots, Q–Q plots, and AIC/BIC often miss subtle nonlinear or random-effects violations.
The method
Fit a random forest to the model's residuals and measure leftover structure with an out-of-bag R². A null distribution is built by permuting residuals within clusters, preserving the design without assuming normality. A dual-criteria rule flags a model only when the statistic exceeds both the 97.5th percentile of the null and a practical effect-size threshold.
Validation
Across 5,400 simulated datasets spanning 54 misspecification scenarios, it held false-positive rates near 1–3% at a 5% level and reached 80–100% power for large mean-structure departures. Applied to the Framingham Heart Study, it correctly cleared well-specified models.
March 2025
A Conformal Prediction Framework for Multi-Label Movie Genre Classification
Led a project pairing a fine-tuned transformer with conformal prediction to assign calibrated sets of genre tags to films, an approach that generalizes to other multi-label problems such as medical tagging and document classification.
The problem
Movies span multiple genres, so the task is multi-label: a model must capture all relevant genres while avoiding spurious ones, and standard classifiers give point predictions with no reliability guarantee.
The method
Fine-tune DistilBERT end-to-end on the combined title, overview, and tagline, then wrap it in global conformal prediction with a sum-based non-conformity score, calibrated on held-out data to target at least 90% coverage.
Results
Produced calibrated genre sets that balance coverage against set size, capturing nearly all true genres while controlling overprediction.
February 2025
Fine-Tuning BERT Models for Recipe Classification
Led a study evaluating transformer-based NLP models on a specialized text-classification task, comparing architectures and tuning them for the best generalization to unseen data.
The problem
Classify recipes as vegetarian from free text, where meaning hinges on subtle domain language, for example "vegetable broth" signals vegetarian while "chicken stock" does not.
The method
Fine-tune and compare BERT Base, BERT Large, and RoBERTa on combined description and ingredient text, with hyperparameter tuning over learning rate, weight decay, sequence length, and batch size, plus early stopping.
Results
BERT Base generalized best across more than 20,000 recipes, outperforming the larger variants on held-out test data.
May 2024
Causal Analysis of Food Insecurity and Type 2 Diabetes Using NHANES
Led a causal inference study on a national health dataset, applying a doubly robust estimator and examining the limits of what cross-sectional survey data can support.
The problem
Does food insecurity causally raise the risk of developing type 2 diabetes? Observational health data makes this difficult, with confounding and no clear time ordering between exposure and outcome.
The method
Apply augmented inverse probability weighting (AIPW), a doubly robust estimator, to estimate the average treatment effect across three NHANES cycles (2013–2018), adjusting for sociodemographic confounders.
Results
No statistically significant effect was found. The core contribution is a careful account of why cross-sectional survey data limits causal inference, making the case for longitudinal follow-up.