Projects

Statistical work

Alongside teaching, my graduate training included substantial applied and methodological work in statistics and machine learning. These projects reflect the depth of training that informs my instruction, particularly in AP Statistics.

Featured · Independent research

2025

RandomForestSpecCheck: A Permutation-Based Random Forest Diagnostic for Linear Mixed Models

Sole author

My primary independent research: a novel, nonparametric diagnostic I designed and developed for detecting misspecification in linear mixed models (LMMs). The method combines a machine-learning measure of leftover structure with a permutation test that respects clustered data, and is packaged as an R function that returns the observed statistic, the full permutation distribution, and a clear decision.

The problem

Linear mixed models are easy to misspecify, and standard diagnostics such as residual plots, Q–Q plots, and AIC/BIC often miss subtle nonlinear or random-effects violations.

The method

Fit a random forest to the model's residuals and measure leftover structure with an out-of-bag R². A null distribution is built by permuting residuals within clusters, preserving the design without assuming normality. A dual-criteria rule flags a model only when the statistic exceeds both the 97.5th percentile of the null and a practical effect-size threshold.

Validation

Across 5,400 simulated datasets spanning 54 misspecification scenarios, it held false-positive rates near 1–3% at a 5% level and reached 80–100% power for large mean-structure departures. Applied to the Framingham Heart Study, it correctly cleared well-specified models.

RRandom ForestsLinear Mixed ModelsPermutation TestingOut-of-Bag R²Novel Method

March 2025

A Conformal Prediction Framework for Multi-Label Movie Genre Classification

Led a project pairing a fine-tuned transformer with conformal prediction to assign calibrated sets of genre tags to films, an approach that generalizes to other multi-label problems such as medical tagging and document classification.

The problem

Movies span multiple genres, so the task is multi-label: a model must capture all relevant genres while avoiding spurious ones, and standard classifiers give point predictions with no reliability guarantee.

The method

Fine-tune DistilBERT end-to-end on the combined title, overview, and tagline, then wrap it in global conformal prediction with a sum-based non-conformity score, calibrated on held-out data to target at least 90% coverage.

Results

Produced calibrated genre sets that balance coverage against set size, capturing nearly all true genres while controlling overprediction.

PythonPyTorchHugging FaceDistilBERTConformal PredictionNLP

February 2025

Fine-Tuning BERT Models for Recipe Classification

Led a study evaluating transformer-based NLP models on a specialized text-classification task, comparing architectures and tuning them for the best generalization to unseen data.

The problem

Classify recipes as vegetarian from free text, where meaning hinges on subtle domain language, for example "vegetable broth" signals vegetarian while "chicken stock" does not.

The method

Fine-tune and compare BERT Base, BERT Large, and RoBERTa on combined description and ingredient text, with hyperparameter tuning over learning rate, weight decay, sequence length, and batch size, plus early stopping.

Results

BERT Base generalized best across more than 20,000 recipes, outperforming the larger variants on held-out test data.

PythonPyTorchHugging FaceBERTRoBERTaNLP

May 2024

Causal Analysis of Food Insecurity and Type 2 Diabetes Using NHANES

Led a causal inference study on a national health dataset, applying a doubly robust estimator and examining the limits of what cross-sectional survey data can support.

The problem

Does food insecurity causally raise the risk of developing type 2 diabetes? Observational health data makes this difficult, with confounding and no clear time ordering between exposure and outcome.

The method

Apply augmented inverse probability weighting (AIPW), a doubly robust estimator, to estimate the average treatment effect across three NHANES cycles (2013–2018), adjusting for sociodemographic confounders.

Results

No statistically significant effect was found. The core contribution is a careful account of why cross-sectional survey data limits causal inference, making the case for longitudinal follow-up.

RCausal InferenceAIPWSurvey DataNHANES