In a corner conference room in the library of Olin College, a team of five Babson students has assembled on a Saturday morning. The students are sprawled out in chairs, laptops open in front of them. The air smells of oranges.
The students are here for DataFest, a national undergraduate competition in which teams try to extract insight from a large, complex set of data. At the moment, though, the team isn’t focused so much on data as on food. The table in front of them is filled with fruit. “We snack on fruit rather than junk to keep our brain cells moving,” says Aryan Jain ’17. Oranges, bananas, grapes—they’re all favorites. “You got to stop for food,” says Ben Greenspan ’17, who pulls out some kiwi. “It’s time for my secret weapon.”
The team needs sustenance because DataFest makes for a long weekend. The event kicked off the night before, and now the team expects to be cooped up in the library for the entire Saturday, poring through numbers from 8 a.m. until who knows when. “We’ll be here until 11 or midnight, at least,” Jain says. The team, which also included Michael Gorman ’17, Yerim Kim ’16, and Wayne Kwon ’16, then would return on Sunday to finish.
All of this raises the question: Why? Why give up your entire weekend to work on data? “We’re nerds,” says Gorman, though Jain quickly corrects him. “We’re not nerds,” Jain says. “We’re just really into data analytics.”
They’re not the only ones. About 35 students in all, from Babson, Olin, and Wellesley colleges, were holed up in Olin’s library for the weekend competition. Broken into teams, the participants were given actual data from Ticketmaster, the giant ticket retailer, and then tasked with analyzing it to find whatever wisdom was hidden there, say about price optimization, or the likelihood of customers reselling tickets, or the effectiveness of certain words in Google ads.
The amount of information the teams had to wade through was enormous. Ticketmaster provided three large data sets. One of those alone contained 4.6 million rows of data. When Jain and his teammates went to work, they first tried to open one of the data sets in Excel. “It crashed,” Jain says. “We didn’t realize how big it was.” The team quickly switched to a heftier program. “We’re used to dealing with a bucket load of data,” Jain says. “This is like the Atlantic Ocean.”
Competing in DataFest required employing a host of skills. Jain and company had to “clean” or prepare the data by removing inconsistencies and redundancies. They then had to analyze the data, glean insight from it, and finally communicate their findings. “The process of analytics isn’t just the technical aspects. It’s not just about running the numbers,” says Davit Khachatryan, assistant professor of statistics and analytics. DataFest contests have sprung up at colleges across the country since the first was held at UCLA in 2011. Khachatryan helped organize the event at Olin together with John McKenzie, an associate professor emeritus at Babson, and professors from Wellesley and Olin.
Khachatryan says Jain and the team were well-prepared for DataFest, but they particularly excelled on Sunday afternoon as they explained their findings in a final presentation to the competition’s three judges. “We treated it like a real-world project,” Jain says. “We presented our results as if we were presenting to the board of directors at Ticketmaster, who might have little, if any, knowledge of data science.” Such an approach is key when dealing with the complexities of data. “You have to be a very good communicator,” Khachatryan says. “You have to come up with something meaningful and effective, and be able to explain it to people who are not familiar with it.”
The judges were so impressed by the team that they created a new contest category, “Best Business Proposal,” to honor its efforts. “The final result was extremely rewarding,” says Jain, though he admits that the weekend was far from relaxing. “I can’t say this is a competition for everyone,” he says. “Only those who are truly passionate about data can survive.” —John Crawford