JC / Railbird

Data

For Reference

Historical and popular handicapping criteria, applied to the top 22 Kentucky Derby prospects, listed according to graded stakes earnings. The spreadsheet includes the complete 2003-2008 Derby fields and the top three finishers 1998-2008 for reference, and will be updated once more during Derby week, after all workouts are done and post positions have been drawn. Note: This year I’ve added two columns, one for “Started on dirt,” another for “Won on dirt,” for those concerned about the surface factor.

Any questions or suggestions? Please let me know in the comments (thanks for asking about when this would be up, Jeff). I’ll be returning to the spreadsheet, stats, and Derby handicapping next week.

4/21/09 Addition: Geno at Equispace has also been hard at work compiling data, and has posted a thorough spreadsheet that includes Beyer speed figures and dosage for the top Derby prospects.

Locked Up

Ray Paulick has posted a piece this morning on the possible expansion of the Jockey Club into the tote business that includes a bit on Equibase and its practice of locking all data up behind a paywall, unlike most major sports. “It’s short-term thinking,” says an executive quoted by Paulick. “If our objective in racing is for the horseplayers to win, we should do everything we can to help him, and increase the churn. That’s where the revenue for our business should come from, not from the statistics the horseplayer needs.” Heck, yes.

On the topic, here’s a bit from a post on June 5, 2008:

The Supreme Court squashed Major League Baseball’s attempt to maintain exclusive control of player statistics, turning down its appeal of an Eighth Circuit Court ruling that allowed fantasy baseball leagues to use the data without paying a licensing fee. “The information used in … fantasy baseball games is all readily available in the public domain,” said the appeals court, “and it would be strange law that a person would not have a First Amendment right to use information that is available to everyone.” Well, this is interesting … and most definitely relevant to the industry. Applied to racing, this ruling could be interpreted to mean that almost all data and statistics in the past performances and results charts are in the public domain (which makes it ridiculous that Equibase buries historical charts behind a paywall), but not presentation of the data or statistics [so no straight re-posting of PDF charts], or analysis derived using proprietary methods (such as speed figures).

CBSSports.com responded to the Supreme Court’s decision by launching a new site that makes available data for baseball, as well as football, basketball, hockey, and auto racing. I’d love to see a similar initiative in racing. As baseball stats wizard Bill James said,

People take information and build knowledge. When you give them new information they will create new knowledge, absolutely and without question.

Free data and historical stats, that’s the way to build the fan base.

“If you look back to 1990 and see what information was available and how it was made available, we’ve accomplished a lot,” Equibase president Hank Zeitlen tells Paulick, and that might be true — but it’s not enough.

The Next Web

“Turns out, that there is still huge unlocked potential, there is still a huge frustration that people have, because we haven’t got data on the web as data.”

In the TED talk embedded above, Tim Berners-Lee recalls inventing the WWW twenty years ago and observes that the web’s original purpose of linking documents together is evolving into one of linking data. (Think APIs, think of the potential for racing. Amazing, right? Try not to get too discouraged contemplating the current state of data distribution in the industry.)

Synth to Dirt, No Problem

A little breakfast time research yields this nugget:

Of the 460 nominees to the Triple Crown, 61 have made the switch from a synthetic surface to a fast dirt track. Of those, 47 improved or replicated their synthetic form on dirt.

Details in this Google doc. Only horses who raced primarily on synthetics at the start of their careers and who switched from such a surface to a fast dirt track are included (so horses whose single dirt starts were over the Monmouth slop of the 2007 Breeders’ Cup are not represented). Also, I made no distinctions between synthetic surfaces and didn’t consider class or distance changes. Generally, results were marked positive (P) if a horse showed an improved BSF and/or finish position, negative (N) if the opposite, and consistent (C) if it ran +/- 3 BSF and/or showed similar placing.

The odds are good that the California synthetic surface form of Colonel John and Bob Black Jack will hold up at Churchill.

Related: Andrew Beyer rants:

But in the 3-year-old stakes races that precede the Kentucky Derby, the presence of synthetic tracks has not merely complicated the game. It has made rational handicapping judgments almost impossible.

Not really. Synthetics are different, but not inexplicable.

After →