Projects Overview

All Projects

PGA Tour: Spatial Modeling

In this project, I’m developing a spatial model that predicts—with uncertainty—how many strokes a professional golfer needs to hole out from any location on the hole, using 1,764 real shots from 139 players on Hole 1 of the 2023 Players Championship at TPC Sawgrass. The raw shot coordinates came in an undocumented coordinate system, so I reverse-engineered the correct transformation using ArcGIS, built the full hole map from official shapefiles, and spatially joined every shot to precise boundaries (fairway, rough, bunkers, trees, green, fringe). I then fit both linear mixed-effects models (with player random intercepts) and an anisotropic 2D Matérn Gaussian process (Vecchia approximation in R’s gpgp package) to produce smooth, interpretable heatmaps of expected strokes remaining across the entire hole. This work demonstrates tough spatial data wrangling, coordinate system transformations, advanced mixed-effects and Gaussian process modeling, and turning complex geospatial data into actionable, visual insights. If you need someone who can clean, map, and model real-world location data end-to-end, this is a strong example.

NFL Big Data Bowl 2026 Prediction Competition

In the 2025 NFL Big Data Bowl Prediction track, I developed a custom transformer model to forecast player positions frame-by-frame during the air phase of downfield passes, using pre-release Next Gen Stats tracking data, targeted receiver information, and the ball’s estimated landing location. The project achieved strong performance on the competition’s masked RMSE metric while producing realistic, interpretable trajectories useful for scouting and coverage analysis.

MLB Baseball Performance vs Salary

I explored the relationship between MBA performance and salary scraped from baseball-reference.com. I made a python package to data clean, visualize, and analyze regression to find insights. I presented my findings in a report and Stream-lit app.

NBA Scouting Report: Data Collection Project

I created a data-driven scouting report that pinpoints international prospects who specifically fix the Sacramento Kings’ biggest weaknesses after a 17-year playoff drought. Starting from mock data, I leveled it up by web-scraping real 2020-21 NBA stats from Basketball-Reference using Python, Selenium, and BeautifulSoup, defeating JavaScript lazy-loading that static methods couldn’t handle. I cleaned and merged the data, engineered advanced metrics, and built a weighted “Kings Fit Score” focused on rebounding, playmaking, defensive impact, and efficiency. The result: a ranked list of hidden-gem prospects who outperform the current roster exactly where Sacramento needs it most. This project showcases end-to-end web scraping, robust data cleaning, domain-driven algorithm design, clear visualization, and ethical data practices. If you’re looking for someone who can independently turn messy, real-world data into actionable insights—especially in sports analytics or high-impact decision support—this is a great example of how I work.

Graduation Project

Coming soon!