Peach Innovators 2.0

Gordan Milovac, Thu May 15 2025 • Python SQL

This project was completed as a capstone for CSCI1420: Machine Learning in Spring 2025 at Brown University. It builds on the original Peach Innovators project by introducing advanced normalization techniques to improve rowing performance analysis in the presence of uncontrolled environmental conditions.

Racing

Background

In rowing, boat speed depends on a complex mix of factors including stroke rate, power output (watts), effective stroke length, and crew synchronization. Telemetry systems like PEACH provide detailed metrics during on-water sessions, allowing technical analysis of stroke mechanics.

The original Peach Innovators project explored how watts, stroke length, and watt variance correlate with boat speed. While initial results were promising, they were hindered by environmental noise—specifically wind and tidal variation—leading to inaccurate modeling.

GitHub Repository:
https://github.com/gmilovac/PeachInnovators2.0

Final Deliverable PDF:
Peach Innovators 2.0 Final Report

Project Goal

Peach Innovators 2.0 aimed to improve performance analysis by normalizing boat speed with respect to environmental conditions such as wind and tide. By removing this noise, I was able to evaluate how rower performance variables affect boat speed under “neutral” conditions.

Methodology

The enhanced dataset consisted of ~10,000 telemetry entries gathered from Brown Men’s Crew practices. Additional data on wind (from Weather Underground) and tide (from NOAA) was manually matched to each rowing session.

To reduce complexity and improve modeling:

Wind was categorized into: tailwind, headwind, and crosswind
Tide was categorized into: with, against, and slack

These were mapped to each rowing piece by direction and time of day.

After multiple trials, the most accurate normalization method:

Normalized each rower’s speed individually
Then averaged normalized speeds per boat

This hybrid method preserved individual differences while capturing uniform environmental impact.

I then re-ran our statistical and ML tests from the original project using:

OLS regression
Pearson correlation
MLP regression
XGBoost models

Data

Results and Analysis

Pearson Correlation (Before vs. After Normalization)

Variable	Raw Speed Correlation	Normalized Speed Correlation
Average Watts	0.686	0.658
Watt Variance	-0.200	0.005
Effective Length	0.292	0.278

Watt Variance lost significance after normalization → previously amplified by environmental effects.
Average Watts remained the strongest predictor of boat speed.
Effective Length remained a moderate but consistent factor.

Machine Learning

MLP RMSE improved from 0.5600 → 0.4954
OLS RMSE improved from 0.285 → 0.119
MSE decreased significantly (from 0.081 → 0.014)

These results confirmed that normalization substantially improved model stability and predictive accuracy.

Table

What Worked and What Didn't

Successes:

Environmental normalization significantly improved performance clarity
Enhanced ability to isolate rower-driven effects on boat speed
Robust categorical encoding of wind/tide data

Limitations:

Incomplete historical gust/wind data
Tidal strength estimates lacked precision
Practice session variation (e.g. rate caps, intensity, lineup) remained uncontrolled
High multicollinearity among predictors (condition numbers > 30,000)

Conclusion

Peach Innovators 2.0 successfully demonstrated the value of environmental normalization in rowing performance analysis. While average watts remained the only consistently significant predictor of boat speed, this project showed that effective length has stable influence, and that watt variance may be overstated when not accounting for external conditions.

Despite limitations, the normalization pipeline introduced here offers a valuable tool for more accurate performance modeling—and opens the door for future research into fatigue, synchronization, and more refined sensor data.

Graph

Future Work

Integrate real-time environmental sensors
Improve gust/tide accuracy using higher-frequency data
Explore ensemble models and LSTM-based time-series architectures
Add synchronization and fatigue modeling to better capture performance dynamics

Author & Contributions

Gordan Milovac (gmilovac): Project design, data gathering, environmental normalization modeling, ML implementation, report writing, and visualizations.