Loading
Tap on the profile icon to edit
your financial details.

Big Data in the United States

Over the past five years, thanks to advances in data storage and computer processing, our capacity to make use of the massive quantities of data we produce each day has erupted. Today, government and industry are just starting to explore the new world of information that has opened up since the turn of the millennium, a world that’s been assigned a misleadingly simple name: Big Data.

Find out now: Is it better to buy or rent?

While Big Data promises to be a significant driver of economic growth in the coming decades, that growth will not be distributed evenly. So which cities are in the best position to take advantage of this new frontier?

The Three V’s

While Big Data is a rapidly evolving concept, it is often characterized in terms of the “three v’s”: volume, velocity and variety. The three v’s represent both the challenges and possibilities presented by Big Data, and any effort to capitalize on our new data capabilities, whether by government or industry, must address each.

  • Volume puts the Big in Big Data: it describes the sheer amount of information collected and held by many companies and governments today—for example, as of February 2014, the largest data warehouse in the world held 12.1 petabytes of data.Even that understates the size of data today, as many major databases are spread across multiple, geographically distinct servers (the term for this is cloud storage).
  • Velocity is corollary to volume: with that much data, it becomes necessary to move and analyze it at something close to the speed of light, otherwise wait-times will render the data effectively useless.
  • The third element, variety, describes the multiple formats and uses of modern data: not just the structured, numeric databases of yesteryear, but unstructured information like Facebook’s database of photos, or Google’s 500 billion word database of books (the Ngram database).

Try SmartAsset’s free retirement calculator.

To those three elements, we would add one more, which applies specifically to cities and governments: openness. Cities have a unique opportunity to use Big Data for good—think fewer traffic jams, more efficient energy use, shorter police and fire response times—but doing so will require an inclusive approach. Open data leverages the skills and expertise of anyone with web-access, making possible breakthroughs and advancement that government alone could never accomplish.

Methodology

To develop our Big Data index, SmartAsset started with the Open Knowledge Foundation’s Open Data Census, a crowd-sourced measure of data openness in major U.S. cities. The Open Data Census looks at openness across 19 types of data typically collected by city governments, including crime, budget, transit and zoning. Cities are scored on their openness in each area, with total scores ranging from 0 to 1900 (each data type is worth up to 100 points). A high score for openness is reflective of a city that is taking the lead when it comes to making smart usage of the data it collects.

In addition to that openness score, for each of the 56 major U.S. cities included in the Open Data Censuswe looked at three other factors indicative of the city’s capability to take advantage of big data in coming years. Specifically, we chose factors that reflect the “three V’s” of big data:

  1. Volume: The number of companies specializing in data processing, hosting and related services, per 100,000 residents.
  2. Velocity: Internet download speeds (megabits per second).
  3. Variety: Percent of the workforce employed in computer and mathematical occupations. One of the greatest challenges in the age of big data is how to make sense of it all. Doing so requires a highly skilled workforce, especially in the areas of computer science, mathematics and statistics.3

We ranked every city in our study in each of the four factors (including openness). We then averaged those four rankings and applied a score from 0 -100 based on that average. A city that ranked first in every category would score a perfect 100, while a city that ranked last would score a zero. The results, below, are SmartAsset’s Big Data index.

big_data_top_10

1. San Francisco, California

While San Francisco has long been recognized as one of the world’s tech capitals, it is quickly establishing itself as one of the world’s Big Data capitals as well. The City by the Bay had the second highest data openness score of any city in America, and efforts by local government to make city data accessible to everyone have been laudable. DataSF.Org, the city’s online clearinghouse for public data, is easy to access and use, and its public API has made possible applications like Parkola, which leads San Francisco drivers to open parking spaces.

Moving to San Francisco? Find out how much house you can afford there.

Openness isn’t all San Francisco has going for it, however. Add to that the fourth highest percentage of workers employed in computers or mathematics, and the highest overall concentration of businesses specializing  in data processing, hosting and related services, and you can see why San Francisco is poised for success in the age of big data.

2. Seattle, Washington

Over the past decade, the development of Amazon’s headquarters has transformed Seattle’s South Lake Union neighborhood from a relatively quiet industrial zone to a bustling center of activity. Amazon, which is a leader in the cloud computing technologies that are so important in utilizing big data, has had a major impact on the city over the past decade. Thanks to Amazon, and other major technology companies in the region (including Microsoft, Zillow and PayScale), Seattle ranked third in our study for its high concentration of workers in computer and math occupations. These skilled workers will be key drivers of growth in the big data economy in the coming years.

3. Austin, Texas

In 2010, when Google originally announced its Google Fiber program that would bring ultra-high-speed broadband internet to select communities throughout the country, over 1,000 cities applied for the service, but just three—Austin, Kansas City and Provo—were selected for the initial round. While Google is currently in the opening stages of enrolling Austin residents in the program, internet speeds in the city are already rising, as providers have improved service in advance of the added high-speed competition from Google. Indeed, as of early 2015, average download speeds in Austin were double those of other top cities in our study, including San Francisco, Atlanta and Boulder.

Along with its blinding fast download speeds, Austin rated especially well for its high concentration of businesses specializing  in data processing, hosting, and related services, and the 5.28% of its workforce employed in computer or math occupations was 10th highest of any city in our study.

4. New York, New York

A city as large as New York faces unique challenges—but it also has an opportunity to meet those challenges in new and innovative ways. It has done just that. Perhaps the most famous example of New York’s data-driven approach to governing is the NYPD’s CompStat system, which (among other things) used crime data to predict when and where crime was most likely and place officers there in advance. That system was introduced in 1995 and has been credited with playing an important role in New York’s falling crime rate.

Since then, New York City has continued to lead the nation in its approach to data. It has the highest Open Data score of any city in America, receiving a perfect score for openness in 13 of the 19 possible categories.

5. Ann Arbor, Michigan

As home to the University of Michigan, which offers top fifteen graduate programs in Computer Science, Statistics and Mathematics, Ann Arbor draws some of the best and brightest in these fields. Ann Arbor has the fifth highest concentration of workers in mathematics and computer occupations of any city in our study.

Ann Arbor is also on the leading edge of several major Big Data initiatives. The University of Michigan library was among the first contributors to Google Books, which aims to scan each of the estimated 130 million unique books in existence into a single, searchable database. Perhaps even more ambitious is Ann Arbor’s recently constructed “M City,” which, when it opens this summer, will be one of the first driverless car districts in the country, a testing ground for automobiles that are operated entirely by computers.

6. Atlanta, Georgia

In recent years, the city of Atlanta has been making efforts to give the public access to city data. In late 2012, Atlanta opened up its public transit data, allowing anyone and everyone to use it for map-building, app-development and research. In 2014, Atlanta was selected to be a “Code for America City,” a program run by the non-profit Code for America that helps cities develop tools and policies to make government more efficient and accessible.

Among the fruits of that program is infrastructuremap.org, an interactive map that displays the costs and goals of potential infrastructure projects throughout the city. Thanks to initiatives like that, Atlanta scored 1215 for its data openness, sixth highest of any city in the country.

7. (tie) Arlington, Virginia

Across the Potomac River from Washington, D.C., Arlington houses a multitude of U.S government agencies, including the headquarters of the Transportation Security Agency, Drug Enforcement Agency, the Pentagon, the Department of Defense and the Defense Advanced Research Projects Agency (DARPA). Each of these agencies is relying increasingly on big data to meet their responsibilities, and each employs a large number of experts in computers and mathematics. Arlington has the second highest concentration of workers in these fields of any city in America.

7. (tie) Boulder, Colorado

Boulder is home to a number of high profile—and very large—data centers. Among these are the National Geophysical Data Center, which houses more than 400 databases from the National Oceanic and Atmospheric Administration; the National Snow and Ice Data Center, which houses snow, ice and climate data for researchers around the world; and a 115,000-square-foot IBM data center, which the company called its greenest data center in North America. Installations like these point to the fact that Boulder will likely remain an important part of the big data economy in future years.

9. Washington, D.C.

With projects like data.gov and the U.S. Census Bureau’s FactFinder search tool, the federal government has taken notable steps in recent years to make its data more accessible to the public. It may be taking cues from the U.S. Capital’s local government. The District of Columbia scored 1025 for its data openness, tenth highest of any city in the country. It was also among the first cities in the country with an official open data policy: efforts to make city data accessible began in 2006.

10. Kansas City, Missouri

As the first city to enroll in Google Fiber, Kansas City has seen download speeds skyrocket over the past two years. As of early 2015, average download speeds have reached nearly 100 megabits per second, triple the rate of most other U.S. cities. Google has also recently announced that it will be introducing Google Fiber for small businesses in Kansas City, which may help to draw more companies relying on the web to access and analyze data to the area.

big_data_full

Photo credit: flickr


1. According to the Guinness Book of World Records, the largest single data warehouse, is in Santa Clara, California, and holds 12.1 petabytes. A petabyte is 1015 bytes. If you assume one minute of a standard MP3 audio file requires one megabyte of space, it would take around 2000 years to listen to one petabyte of music stored as MP3s.

2. Since the Open Data Census relies on teams of open data “librarians” in each city to report on their city’s open datasets, not all cities are scored. If you are an open data expert who lives in a city with great (or not so great) data policies, and which does not appear on the list, we encourage you to go to us-city.census.okfn.org to learn about becoming a librarian.

3. Data on the number of companies offering data processing, hosting and related services comes from the U.S. Census Bureau’s Zip Code Business Patterns series. Data on internet download speeds comes from netindex.com— we averaged download speeds for the first 10 days in January for every city in our study. Data on the percent of the workforce employed in computer and mathematical occupations comes from the Census Bureau’s 2013 American Community Survey.

Nick Wallace Nick Wallace studied Economics at the University of Washington. He enjoys getting people thinking about finances by looking at the numbers. Nick is a freelance journalist and data analyst living in Michigan. He still lends his economic and analytic expertise for SmartAsset's studies.
Was this content helpful?
Thanks for your input!