Scraper-Wiki: A Wake-Up Call

by Megan Cloherty

Being a journalist requires a certain amount of perspective. One has to be aware of what they know, what they don’t know and understand the scope of the story. After nearly two years in an interactive journalism major where my classmates and I have learned a fair amount about coding, website design and computer languages, we student-journalists thought we had a good grasp of what we knew, what we did not know and the scope of computer programming. We had no idea.

Attending the Scraper-Wiki conference at The Washington Post building on March 31 was a wake-up call for many of us in American University’s Cohort 13 graduate class. We joined the conference on its third and final day when most of the programmers in the bunch had been over multiple data sets, deciding which to explore and which to drop to the side. Few of any of those data sets, from what we could tell, were chosen from an editorial standpoint.

There were no stories to these data sets; no reasons for seeking them out. The programmers at the Scraper-Wiki conference as a whole did not have an interest in the storyline of the data. They were interested in being the man to unearth it.

As a journalist, this caught me off-guard. “So how did you decide to pick this data set,” I asked Jeremy Bowers, a senior developer at The Washington Post. “I don’t know,” he said with enthusiasm. “It just looked huge!”

Bowers, by all accounts, is a computer genius especially given the range of talent in the room. I brought the average down a fair amount. But he did not understand the reason for my questioning. So I tried again.

“Are you doing a story on crime that you need these numbers for,” I asked Bowers. “No. But I saw a really cool breakdown on the L.A. Times site I can show you. What they did with data, you just have to see it,” he responded.

The L.A. Times “Mapping L.A.” project is something to see. It maps the greater L.A. area, showing crime by neighborhood. Using data sets from multiple law enforcement sources, the programmers color-coded areas of the map based on the different types of crime that happened there. By drilling down into the site the reader could find the population stats, school information and demographics of each neighborhood. It was an impressive visualization that told a story and it was completely driven by the data.

Bowers scoffed in admiration when I asked him how long the project must have taken the L.A. Times team. I quickly realized the old saying ‘Rome wasn’t built in a day’ fit well in this world of data-liberation. That’s what the Scraper-Wiki Conference leaders called it, “data liberation.” Seemed a bit dramatic for my tastes, but I suppose it took more than a few steps and a few approaches to free these data sets from the antiquated pdf formats that held them captive. ‘Liberation’ might just be the right word.

The trouble with journalists, beyond the obvious joke to insert here, is that we ask too many questions. We want the story to make sense, to add up. The problem with programmers is, they don’t. Data doesn’t need to make sense. These two groups aren’t really ‘in trouble’ per se, until you put them together.

While working on our project to unearth data from the Metropolitan D.C. Police crime database, Bowers appeared to get more excited as the steps to unearth our data became more complex. It was as if the puzzle was multiplying and he couldn’t wait to construct a corner of it. I, however, wanted to know what the puzzle would show us. We were at an ideological impass. We did not understand each other, but we had to proceed.

In the end, working with Bowers and the other programmers at the “crime” table of the Scraper-Wiki Conference was an eye-opening experience. We didn’t end up with a story or even much data to speak of. That was okay by Bowers. To his credit, he had put in days of effort.

I got used to the idea that I didn’t have a story, which I realized hours later was okay by me because of what I’d learned from the conference. Data is not an easy thing to find and it’s not an easy thing to extract. Even after “we” had worked on a code that would read and collate the data in the Metropolitan D.C. Police crime database, there was too much of it that didn’t fit.

Data outliers, if you will, would require one of us to enter the random numbers into the correct format one-by-one. The time that would require made the project unfeasible. But because of that project, I now understand the scope of how difficult the process of data scraping can be. I know more about what I don’t know and more about how little I do.

For now, I’ll stick to the writing bit.

Dig it? Share it.

About The Author


Other posts by

Author his web site


04 2012

Your Comment

Add video comment