Skip to main content

Horse Racing Modeling & Datacapping

There is a really interesting discussion going on in the World around us about datamodeling. The "geeks" are having a fun time extolling the virtues of modeling since the Presidential election returns came in. Those models, which many of us have been following since the 1990's, took a bit of a beating in the media in 2000 and 2004 (some were wrong with suspect polling data), but they were back with a vengeance this year. Most did really well - Sam Wang of Princeton and Nate Silver being the most popular - and if you aggregated the aggregators you got an even better result. For the record, the best result in terms of EV predictions for polling modeling was Drew Linzer of Emory University.  There's a sharp fellow.

Yes, the real clear politics polling average was correct for everything but Florida - which they had close to a tie -  so a guy with a grade 4 education and an internet connection could make a solid prediction, but the accuracy based on the excellent polling data this year by the modelers was very, very good. No one, not even the harshest skeptics, can deny that.

It should be noted it is different as a bettor though, and there are limitations. For example, Sam Wang had a 74% prediction that the D's would take the House in late September. This was way off, and betting markets did not bite an inch on that prognostication, never really making it a more than 25% chance.  Models are only as good as the data that goes into them and if you want to bet pure percentage chances, beware. When you lose, you lose big (it happens more often than you think) and need a lot of wins to make up for it.

In horse racing, data modeling has been the poor cousin of pure handicapping for a long time. They're geeks, and geeks don't watch horses workout, or get a "feel for a race". Geeks are discounted.

Betting models and the geeks are doing some very good things, though. They seek value, I would argue, in a much better (and more profitable) way than a human being can.

For a betting model to work in horse racing it is focused less on prediction value and more on betting value. If we want a prediction model we have a very simple out, of course: Crowdsource. The tote board is the best model there is, and it's been this way since horse racing allowed betting, or probably before. If 740 people out of 1000 in the grandstand in 1895 in a small town in North Carolina liked a horse to win, that horse probably won more than any other piece of data we'd use.  But if you use predictive value to guide your bet, I am pretty sure you like the ATM.

I have used Jcapper as my data "geek" program for many years now. I have also used Dave Shwartz's HSH (another good one that I have not tried is Ken Massa's HTR - I have several friends who use it who are impressed). One thing you will notice about Dave, Jeff and Ken, is that they are 100% focused on value. Value is all that matters when they model.

Jeff, for example, weights 700 plus factors in some way shape, or form. But his overall rating is not based on how great they are at predicting, but how great they are at showing signals something will be underbet based on the horse's chances. If a factor has a good predictive quality, but has a great value quality, it goes into the box.

This is why a Jrating might have a hit rate of 28%, but it has a much higher ROI. The rating seeks horses that look off form, or in some cases terrible on the form. One factor, for example, actually scores horses which are really ugly, but they're fast. They don't win as often as the chalk, but when they do they pay.

This is something that traditional horseplayers have trouble with. Ours is an old game and it's based on feel, or angles, or what we've always done. I don't know how many times I look at a highly rated Jcapper or HSH horse and see running lines, or an overall profile I hate. The horse is sitting on the board at 6-1 and I may say "he should be 15-1." based on what I know about horse racing.  But then the horse goes and runs well. You've seen these horses too. "How did he win, he doesn't have ______" where the blank is something we look for in the form, as it is ingrained into our horse handicapping subconscious.

It's the halcyon days for election modeling right now. Horse racing "datacapping" is far behind. But make no mistake, while you want to be wary using election modeling to bet serious money, in horse racing you can, and you do want it. Betting on horses via a model gets rid of the noise yes, but they do more than that. Horses with hidden factors win, and when they do, we want to be on them, because it's one of the few ways to beat the rake.


Comments