My friend Chris and I decided to enter a Kaggle competition for the 2016 NCAA Tournament. March Madness is notoriously challenging to predict from a statistical point of view (see my alma mater MSU, a #2 seed, who earlier today lost in the first round to Middle Tennesee, a #15 seed. No one saw that coming. Thus is the nature of March Madness.
But barring those rare upset cases, we are actually pretty confident in our predictions, or at least the basis for them. We used a Bayesian technique to generate probabilities of each team winning, based on a latent ability variable (comprised of many “black-boxed” statistics). Our goal is to outperform a baseline model of simply choosing the higher-ranked team each round. You can see our predictions and read more about our model in the shared Dropbox folder below.