Will Moving to the Commercial Cloud Leave Some Data Users Behind?

As part of their missions, federal agencies generate or collect massive volumes of data from such sources as earth-observing satellites, sensor networks and genomics research. Much of that information is useful to commercial and academic institutions, which now can usually access this publicly generated data from agency servers at no charge.

As the volume of data continues to expand, however, many agencies are considering the use of commercial cloud services to help store and make it available to users. While agencies may have different strategies, these new partnerships could result in user fees levied on downloads and analyses performed on the data while it remains in the cloud.

Writing in a policy forum article published February 8 in the journal Science, a 色花堂 Technology space policy researcher who studies such data use urges caution about the design of these commercial cloud partnerships and possible imposition of user fees.

鈥淯nder the current system, free and open government data is used by scientists to conduct research, by entrepreneurs to create new businesses, and by citizens and other organizations to promote government transparency,鈥 said , an assistant professor in 色花堂鈥檚 . 鈥淚f users must pay fees to download or analyze the data, this will decrease the ability of these users to access and work with data. Past experience suggest that the impacts of this decrease in data use could be large 鈥 both for individual users and for society as a whole.鈥

Moving data to commercial cloud systems would likely provide broader access and more efficient analysis options, but she cautions those advantages could be offset by the cost, particularly for organizations with small budgets.

鈥淎gencies risk losing some of the benefits of this transition by not budgeting for the costs associated with data downloads and analysis, up to a reasonable level,鈥 Borowitz said. 鈥淢any who would be interested in using the data may not be able to pay the associated fees. Researchers, nonprofit organizations and others who do not directly profit from the use of this data are most likely to be affected.鈥

Borowitz recently spent two years at NASA and witnessed both the development of systems that will dramatically increase data collection and debates about future data storage. She recently authored a book, Open Space: The Global Effort for Open Access to Environmental Satellite Data, published by MIT Press. 

She would like to see the agencies that provide data continue to shoulder the costs, up to some 鈥渞easonable level,鈥 to ensure that the data continues to be readily available to all users. As an alternative to commercial services, some agencies are considering development of their own, custom-built cloud solutions, and will have to weigh the cost of benefits of the different options. There will also be technical, organizational and policy issues to consider.

鈥淎gencies are taking seriously issues of security and long-term preservation of data,鈥 Borowitz added. 鈥淲hen working with commercial providers, some are concerned about the possibility of getting 鈥榣ocked in鈥 to one provider, due to the large costs of migrating data from one system to another. It is possible that costs and capabilities could change over time. On the other hand, commercial cloud providers have large workforces and extensive infrastructure that allow them to provide services and capabilities well beyond what any one agency would be able to maintain.鈥

Borowitz notes that most agencies have not made final decisions about their cloud-based programs, so there should be adequate time to work through these issues.

鈥淢ost agencies that make data publicly available, particularly science agencies, are already discussing and/or beginning to make the transition to cloud systems,鈥 she said. 鈥淗owever, these programs 鈥 at agencies like NSF, NIH, NASA and NOAA 鈥 are still in their early phases, and there is still opportunity for feedback to be provided and adjustments to the programs to be made.鈥

The existence of fees for access to government data is not without precedent, but Borowitz argues that past experience suggests that user fees result in significantly less use. Before Landsat data 鈥 satellite imagery of Earth 鈥 was made freely available in 2008, no more than 25,000 images a year were purchased from the collection. 鈥淲ithin a few years of implementing the free and open data policy, the government was distributing 250,000 images a month,鈥 she said.

That number provides a suggestion of what the often cash-strapped agencies are dealing with. According to the paper, the National Oceanic and Atmospheric Administration (NOAA) houses more than 100 petabytes (PB) of data and generates more than 30 PB per year from satellites, radars, computer models and other sources. NASA projects that its archive will grow to 250 PB by 2025. And the amount of genomic data at the National Institutes of Health is growing exponentially.

A petabyte is 1,024 terabytes, or a million gigabytes. A gigabyte is 1,024 megabtyes. For scale, an average photograph taken by a high-end cell phone camera can be in the neighborhood of 10 megabytes. Laptop computers may be able to store as much as a few terabytes of data.

Borowitz sees the transition to cloud computing as both an opportunity and a challenge for the future availability of government data. 鈥淭he decisions being made right now about the structure of these programs have the potential to significantly impact researchers and society as a whole, so it is important to raise awareness and increase engagement on these issues.鈥

CITATION: Mariel Borowitz, 鈥淕overnment data, commercial cloud: Will public access suffer?鈥 (Science, 2019) 

Research News
色花堂 Technology
177 North Avenue
Atlanta, 色花堂  30332-0181  USA

Media Relations Contact: John Toon (404-894-6986) (jtoon@gatech.edu).

Writer: John Toon