JC / Railbird

Data

2017 Saratoga Babies

They’re off at Saratoga and that means I’m tracking every juvenile race, every juvenile starter in the Spa babies spreadsheet once again. Through the first few days of the meet, trainer Todd Pletcher is, as usual, the leader in number of 2-year-old starters. He’s sent out eight, but won only two races — and neither of the winners were a post-time favorite. Go figure.

I update the spreadsheet after each day’s card. You can sort the sheet by column. You can also download a copy as an Excel or CSV file for your use.

Finding a Way

From Jay Bergman’s remembrance of Sports Eye founder Jack Cohen:

What was so impressive about Mr. Cohen to me was his ability to get around obstacles in his way. At the forefront of his effort to put out past performances in Sports Eye, his publication that had only given entries, results and selections, was an obstacle called the track program. The program printed for the tracks at Roosevelt and Yonkers was a product of Doc Robbins. That product was a monopoly of sorts and Mr. Cohen had to figure out how to get access to information in advance of race day in order to provide “his version” of past performances in a timely fashion. Ultimately what he did was pay off Robbins’ employees to provide him a minimal amount of information that would allow for his staff to connect-the-dots and provide a competitive product.

I love these stories of people hustling to get around data monopolies, even if they’re all from decades ago, and racing data is now stagnant.

Figuring

The March edition of HANA’s monthly newsletter is now out, and it includes two great interviews, one with jockey Julien Leparoux, and the other with Dana Byerly talking about Horse Racing Data Sets, the site she launched last month for sharing data. I’m biased, but HRDS is swiftly becoming a good, useful resource — the most recent addition to the site is a spreadsheet from Brisnet containing 25 years of winning speed and class ratings, which I’ve just begun exploring for possible Kentucky Derby implications.

Somewhat related: TimeformUS posted their winning figures for the last five years of Triple Crown race preps. You can find Beyer speed figures for the same races since 2010 in the Derby prep schedule (the column labeled “BSF”).

HANA’s newsletter also includes a short primer on churn, which Lonnie Goldfeder recommends setting a goal for each day you play. Goldfeder’s latest column at Daily Racing Form is about staying sharp; it’s a reminder that wagering, like any discipline, requires a commitment to practice.

Data Points

Marketing horse racing through its rich data is on the agenda for the 2013 UA-RTIP Symposium on Racing and Gaming:

New Ways to Look at Numbers
Sports fans are traditionally a group of people who have an insatiable hunger for facts, figures and statistics. Racing is a sport that is data rich but that attribute hasn’t been marketed. Panelists look at new data that could be presented to the racing audience, new ways to present the information we currently provide as well as how all of it can be used to attract new customers and increase the frequency of current players.

It’s also the subject of Thorotrends’ call to “release the data,” which I hope the Symposium data panelists will read before they arrive in Arizona, along with everything Superterrific has gathered on the issue of freeing racing data from paywalls and PDFs in her latest on Exacta-mundo.

Making data more available can only help attract more horseplayers. I’ve believed so for as long as I’ve been a racing fan, and have only been confirmed in that belief watching other sports move ahead with data, whether in creating APIs, building it into mobile apps, supporting hackathons, or holding events such as Major League Baseball’s Bases Coded, in which teams competed “to create the next great interactive media product for baseball fans.”

Note, I’m not advocating that past performances and other handicapping products should be free, or that Equibase should release all the data it collects via an API without restrictions, although I do think it should release the majority of its data and without a significant lag. (Just as full charts can be downloaded within a hour of a race, so should race data.)

If you’re wondering what free(er) data might look like in racing, consider the models that already exist, ranging from MLB’s minimalist Gameday API to ESPN’s robust developer center. Imagine if Equibase created something similar to ESPN, which opens its data feeds to users for non-commercial applications with some usage restrictions (such as limiting the number of API calls within a set period) — as Thorotrends writes, the majority of racing fans would continue to use data as they always have, but there would be a small group who would hack and experiment. It would make racing feel less stagnant and less mysterious, leading to more fans and more wagering.

Market the data, certainly, just free the data first.

10/14/13 Update: Yes! From Dana Byerly, here’s a real-world example of how a horse racing API could be used.

← Before