“I’ve been developing my own handicapping software as a hobby since 2007, but the biggest problem I’ve faced is having enough data to properly calibrate and test the system,” said Ligett. “The data provided as part of the contest is just what I needed to move the idea forward.”
Lock up data, squelch innovation. Give developers data, spur innovation.
Dublin, dropped from my PDI top 10 after the Arkansas Derby, returns this week at #1, a move driven by Eskendereya’s withdrawal from the race and a few hours with the past performances. Last week, when the field was looking set, I was intent on figuring out who would run behind Eskendereya — I know, I should feel more embarrassed to admit that. Every wise guy out there has been complaining about how with the loss of Esky, all the odds on the horses they were really planning to play have plunged. Whatever. The colt had the two best Beyer speed figures of this bunch, a perfect prep season, a fitting pedigree. He’s also physically impressive — watching at Aqueduct on Wood Day, I was struck by how much more mature and robust he looked than the other starters (check out his chest and shoulders in this photo by Sarah K. Andrew). Watching the Wood replay, what grabbed my attention was how much he reminded me of Big Brown (and I wasn’t even a Big Brown fan), exhibiting a similar control and ease as he took the lead and drew away. I was going to bet the chalk on Saturday, and happily.
As for Dublin, I still have some concerns he won’t relish the Derby distance, but then, ten furlongs seem questionable for several of this year’s expected starters, who, for the most part, haven’t made much of an impression on me. His track work this weekend could also suggest problems: After attempting to bolt during a Saturday gallop, Dublin drifted out around the final turn in his Sunday work. What’s more, DRF clocker Mike Welsch noted, “the failure to gallop out with any serious energy cannot be taken as positive signs less than one week out from the big event.” A factor in his favor, though, is the relative toughness of the Oaklawn preps, in which Dublin ran well. Off a second in the Southwest, a third in the Rebel, and a fast-closing third in the Arkansas Derby,* he could be poised to move forward.
Devil May Care, coming into the Derby with a competitive profile and a slightly faster time in G2 Bonnie Miss Stakes than Ice Box in the G1 Florida Derby on the same day, moves to #2 and Sidney’s Candy to #3. Lookin at Lucky remains at #4, despite his exceptional qualities. I would rate him higher, but for his tendency to find trouble, and he’s only had two preps this year. There’s also the matter of blinkers-on, blinkers-off: Trainer Bob Baffert is still trying to figure out the colt, and he’s running out of time. But then, the new Derby favorite worked brilliantly this morning. (Trying to sort it all out this evening, Bill Finley’s see-no-works, hear-no-works approach to Derby week suddenly seems a very sensible one.)
PDI top 10 for 4/27/10: 1) Dublin 2) Devil May Care 3) Sidney’s Candy 4) Lookin at Lucky 5) Endorsement 6) Awesome Act 7) Jackson Bend 8) American Lion 9) Discreetly Mine 10) Stately Victor
Call it the Twerby? The 2009 Derby was the first in which Twitter played a real role, even if it was mostly to inspire an ongoing debate about the usefulness of the service. This year, however, Twitter has been a source of fast-changing news (see Ed DeRosa’s tweets Sunday on Eskendereya skipping work, doubtful for the Derby, out of the Derby), close-ups of contenders (see Frances J. Karon’s pictures of Dublin and Devil May Care), as well as workout times. Thanks to Dana Byerly of Green But Game for pointing out this Blood-Horse article on Monday’s Derby works, which cites tweets from Churchill’s media department. Observed Vic Zast, by tweet of course,
Not amazing that Esky out of Derby. Favs can drop out in last week. What’s amazing is how fast social networking sites passed news along.
How much a scene can change in just a year, and for the better.
*I was asked last week about column 15, “Key Derby Preps,” on the historical criteria spreadsheet. The numbers that appear there are simply how many such races a horse started in while prepping. Qualifying races were determined by the total number of Derby starters that emerged from each race, as well as the total number that finished ITM in the Derby, 1998-2008. A dozen races rated highly on both counts. It’s a quick measure of contenders’ preps, based on recent trends. Kevin Martin of Colin’s Ghost has done much deeper research on Derby preps: I recommend his work for more insight into using historical trends for judging prep races.
On first reading, I thought there was an error in the headline of the press release: “Equine Injury Database Statistic Released by The Jockey Club.” But no, the Jockey Club did release just one statistic, and it is a sobering figure:
Based upon a year’s worth of data beginning November 1, 2008, from 378,864 total starts in Thoroughbred flat races at 73 racetracks … 2.04 fatal injuries were recorded per 1,000 starts.
TJC did not report the actual number of deaths, but the Courier-Journal did the math, coming up with:
… about 773 horse deaths, or an average of nearly 15 fatal injuries a week.
For comparison, the New York Times offers:
In England, for example, the average risk of fatality ranges from 0.8 to 0.9 per 1,000 starts. In Victoria, Australia, studies reported the risk of fatality from 1989 to 2004 at 0.44 per 1,000 starts.
More detailed data, although not track-by-track stats, will be released at the Welfare and Safety of the Racehorse Summit in June.
Kentucky Derby 2009 bump chart, from Charts and Graphs:

Created with ggplot2 for R. More from Learning R on its construction.
Mine That Bird’s rail dash looks no less improbable graphed, while Pioneerof the Nile’s even run looks better. Too bad there’s no Trakus data, which would make for a busier, but surely richer and more revealing, chart.
Historical and popular handicapping criteria, applied to the top 22 Kentucky Derby prospects, listed according to graded stakes earnings. The spreadsheet includes the complete 2003-2008 Derby fields and the top three finishers 1998-2008 for reference, and will be updated once more during Derby week, after all workouts are done and post positions have been drawn. Note: This year I’ve added two columns, one for “Started on dirt,” another for “Won on dirt,” for those concerned about the surface factor.
Any questions or suggestions? Please let me know in the comments (thanks for asking about when this would be up, Jeff). I’ll be returning to the spreadsheet, stats, and Derby handicapping next week.
4/21/09 Addition: Geno at Equispace has also been hard at work compiling data, and has posted a thorough spreadsheet that includes Beyer speed figures and dosage for the top Derby prospects.
Ray Paulick has posted a piece this morning on the possible expansion of the Jockey Club into the tote business that includes a bit on Equibase and its practice of locking all data up behind a paywall, unlike most major sports. “It’s short-term thinking,” says an executive quoted by Paulick. “If our objective in racing is for the horseplayers to win, we should do everything we can to help him, and increase the churn. That’s where the revenue for our business should come from, not from the statistics the horseplayer needs.” Of course, bloggers have been exploring this issue for some time. Previously on Railbird …
From a post on June 5, 2008:
The Supreme Court squashed Major League Baseball’s attempt to maintain exclusive control of player statistics, turning down its appeal of an Eighth Circuit Court ruling that allowed fantasy baseball leagues to use the data without paying a licensing fee. “The information used in … fantasy baseball games is all readily available in the public domain,” said the appeals court, “and it would be strange law that a person would not have a First Amendment right to use information that is available to everyone.” Well, this is interesting … and most definitely relevant to the industry. Applied to racing, this ruling could be interpreted to mean that almost all data and statistics in the past performances and results charts are in the public domain (which makes it ridiculous that Equibase buries historical charts behind a paywall), but not presentation of the data or statistics [so no straight re-posting of PDF charts], or analysis derived using proprietary methods (such as speed figures).
CBSSports.com responded to the Supreme Court’s decision by launching a new site that makes available data for baseball, as well as football, basketball, hockey, and auto racing. I’d love to see a similar initiative in racing. As baseball stats wizard Bill James said,
People take information and build knowledge. When you give them new information they will create new knowledge, absolutely and without question.
Free data and historical stats, that’s the way to build the fan base.
And here’s a commenter responding to another post from September 9, 2008:
I feel silly declaring any facet the BIG problem, considering the state of the industry these days, but o_crunk brings up an issue that I think has become a major drag on racing, and that’s all the data and information and video tied up by exclusives and hidden behind paywalls. It’s tough to market this game when you force players to jump through endless hoops to get the most basic information and watch races — this is especially so when we talk about marketing racing to gamblers, and focusing more on the game. Racing can’t hold on to the people who might be the most interested if, after hooking their attention and enticing them with promises of easy wagering, intellectual stimulation, and friendly competition, the industry has to tell people this ADW carries these tracks and another those tracks, and you can watch live streaming video, but only if you have a wagering account and you wager a certain amount, and you can’t find simple historical data or get more than the most basic entries and results information without paying and paying and paying again. It’s folly, when we’ve come to a point technology-wise where people expect to go to Google and find whatever they want — or download mobile apps that give them all the game information they can handle for their favorite team or sport.
“If you look back to 1990 and see what information was available and how it was made available, we’ve accomplished a lot,” Equibase president Hank Zeitlen tells Paulick, and that might be true — but it’s not enough.
“Turns out, that there is still huge unlocked potential, there is still a huge frustration that people have, because we haven’t got data on the web as data.”
In the TED talk embedded above, Tim Berners-Lee recalls inventing the WWW twenty years ago and observes that the web’s original purpose of linking documents together is evolving into one of linking data. (Think APIs, think of the potential for racing. Amazing, right? Try not to get too discouraged contemplating the current state of data distribution in the industry.)
A little breakfast time research yields this nugget:
Of the 460 nominees to the Triple Crown, 61 have made the switch from a synthetic surface to a fast dirt track. Of those, 47 improved or replicated their synthetic form on dirt.
Details in this Google doc. Only horses who raced primarily on synthetics at the start of their careers and who switched from such a surface to a fast dirt track are included (so horses whose single dirt starts were over the Monmouth slop of the 2007 Breeders’ Cup are not represented). Also, I made no distinctions between synthetic surfaces and didn’t consider class or distance changes. Generally, results were marked positive (P) if a horse showed an improved BSF and/or finish position, negative (N) if the opposite, and consistent (C) if it ran +/- 3 BSF and/or showed similar placing.
Will the California form of Colonel John and Bob Black Jack hold up at Churchill? The odds suggest so.
Related: Andrew Beyer rants:
But in the 3-year-old stakes races that precede the Kentucky Derby, the presence of synthetic tracks has not merely complicated the game. It has made rational handicapping judgments almost impossible.
Now, now, Andy, this isn’t so. Synthetics are different, but not inexplicable.
Copyright © 2000-2010 by Jessica Chapel. All rights reserved.
Site credits: WordPress / DePo Skinny Theme / Dreamhost.