We’re recognizing Love Data Week (February 11-15) and this year’s theme is ‘data in everyday life.’ We’ve asked several researchers who participated in our Better Science Through Better Data event to reflect on the importance of data sharing in their own lives. We’ll be sharing their stories all week so keep checking back!
Written by Alasdair Rae
I’m supposed to write all about how I love data and how it can change the world. But I’m not going to. Not because I’m grumpy, but because I think we’re thinking about it all wrong. You see, a lot of the buzz around data seems to suggest that if only we had better data everything would be okay, as if people need data and that open data itself is the answer. I think we’ve got this back to front. If we want to make things better, we need to understand how data needs people to make it useful and meaningful, to test it, and to question it.
So, I love data in the same way that I love iron ore: i.e. not very much at all. It’s a nice raw ingredient, but it does need a good bit of work before it becomes useful. Yet I do love what data can do, and I love the new world of open data analysis and the people trying to do good things with it. I might even go so far as to say I love some open source software. But I don’t want anyone thinking I’m weird. Instead, let me tell you a very short story about how I got a bit carried away with myself at work and ended up with new friends and a Tampa Bay Rays headband. My conclusion to all this is that data needs people more than people need data.
I’m quite into mapping and spatial analysis, and this is one of my go-to procrastination methods. In early 2016 a paper I wrote on commuting in the United States was very sensibly rejected by a leading academic journal, so instead of moping around I decided to put the paper online, alongside the data I used. Into the void, as they say. Some time after this, a brilliant American scholar called Garrett Dash Nelson stumbled across the data and paper and did something cool with it. It involved algorithms and code, so I was obviously going to have to take a closer look. Put simply, Garrett applied a network partitioning algorithm to the dataset I carefully assembled, cleaned and swore at for several months and, to be honest, he did it a lot better than I ever could have.
So, naturally, I asked him if instead of just doing it for Massachusetts, he wanted to attempt it for the whole United States and write a paper together. With a PhD to finish, a job to find and no spare time, Garrett somehow still said ‘yes’. We proceeded to write a paper on the ‘economic geography of the United States’, and found that the US can be divided into 50 or so ‘megaregions’, kind of like states but their boundaries are based on where people work. Fast forward to January 2019 and our paper has now been viewed almost 290,000 times and one of our megaregion maps featured on a recent cover of the Proceedings of the National Academy of Sciences. I’d normally consider 500 views as ‘going viral’, so the reaction to our paper was quite nice. We also published all our files as open data on Figshare and this now has the highest Altmetric Attention score of any dataset in the UK. We even hosted a successful AMA on Reddit.
The results resonated far and wide and we were contacted by people working in industries we couldn’t possibly have imagined would find our work useful. An executive from the Tampa Bay Rays said our work was useful as he planned a new baseball stadium (thanks for the merch!), an epidemiologist said our new boundaries helped him understand disease transmission, a renewable energy expert said our work was just what she needed, a Silicon Valley infrastructure planner said our boundaries made perfect sense, and someone called ‘monkeychef’ on Reddit said “Holy crap they drew the definitive line to divide north and south Jersey”.
Why do I think this story demonstrates that data needs people more than people need data? The first reason is that this wasn’t new data. It just needed people with the inclination and time to make it useful. The fact that it was open data wasn’t enough. The second reason is that it needed human input to make it appealing and accessible. A third reason is that it demonstrates how it needed human interaction to take it beyond the realm of data into information and knowledge. The knowledge came about through human interaction. A fourth reason is that the story shows how it needed people from different disciplines and backgrounds to see the richness in it and the value of what it could tell us. But perhaps the most important reason in this story is that the data needed people to create it in the first place. The daily grind of the commute for 130 million Americans is what made this data. Without them, there would be nothing.
If this reads something like a manifesto, that’s not entirely coincidental. In I Love Data Week I think we should focus on what data can help us do, and the people who help us achieve it.
Watch Alasdair’s lightning talk at #SciData18 here.
Springer Nature is committed to supporting researchers in sharing research data and in receiving the credit you deserve. Read more about our research data products and services.
About Alasdair Rae
Alasdair Rae is a Professorial Fellow in the Department of Urban Studies and Planning at the University of Sheffield. His research focuses on cities, regions, housing markets, neighbourhoods, inequality, transport and spatial analysis. He uses data a lot, and he likes to make maps. You can find out more about his work on his website and frequently updated Stats, Maps n Pix blog. Follow him on Twitter at @undertheraedar.