Oct. 2, 2017


Report Offers Guidance to Federal Government on Creating a New Statistics Entity to Combine Data From Multiple Sources While Protecting Privacy

WASHINGTON -- A new report from the National Academies of Sciences, Engineering, and Medicine offers detailed recommendations to guide federal statistical agencies in creating a new entity that would enable them to combine data from multiple sources in order to provide more relevant, timely, and detailed statistics – for example, on the unemployment rate or the rate of violent crime. The report reviews options for structuring the new entity, identifies approaches for protecting individuals’ privacy while linking multiple sources of information, and identifies areas where staff training is needed.

The study committee’s previous report, released in January 2017, recommended the establishment of an entity to facilitate secure access to data from multiple sources. The new report builds on that recommendation, noting that a new entity to combine data may enable more detailed, timely updates to inform decision-makers on important economic, societal, and health indicators.  It offers more detail on the process for developing such an entity and provides recommendations for implementation.

“A great deal of recent public discussion – especially that prompted by the recent report of the Commission on Evidence-Based Policymaking -- has focused on the value of combining data and of creating a new entity to do so,” said Robert M. Groves, provost of Georgetown University in Washington, D.C., and chair of the committee that wrote the report. “We hope our report will complement and inform the commission’s efforts and help initiate a more detailed discussion among stakeholders about the best path forward for the federal statistical system.”  

Some federal agencies are already using multiple data sources to craft more useful datasets, the report notes.  For example, the National Center for Health Statistics (NCHS) routinely links information gathered from the National Health Interview Survey with administrative records from the Centers for Medicare and Medicaid Services, which allows researchers to analyze the relationship between health and the uses and costs of medical care.  While agencies are pursuing efforts to link multiple data sources individually and are currently implementing changes to their systems, a decentralized effort incurs large opportunity costs and limits potential benefits, the report says.

Emphasizing that privacy protection should be at the forefront of the new entity’s design, the report urges statistical agencies to train their technical staff in modern computer science technology – including secure multiparty computing, cryptography, privacy-preserving, and privacy-enhancing technologies -- so that they can better ensure security and enhance privacy protections. The report also identifies technological approaches that can minimize privacy risks; for example, secure multiparty computing could in some situations permit a statistical agency to compute a desired aggregate result without ever actually learning all the detailed data from the different data sources.

The report also recommends instituting an advisory committee on privacy to inform and advise the new entity on policies and current best practices. The entity could also serve as a valuable center for coordinating research across the federal statistical system and the academic community on the application and evaluation of privacy-preserving and privacy-enhancing techniques for federal statistics.

To do this, the entity needs strong legal authority to protect the confidentiality of data accessed through the entity and to ensure that the data are used only for statistical purposes.  In addition, the entity’s legal foundation should foster independence from political and other undue external influence in providing access to, linking, and analyzing data, and in producing and disseminating statistical information. The new entity should also maximize the transparency of its statistical activities by posting a summary of the data sources accessed through the entity on a public website. The summary should include the purpose and public benefit of the study, the data sources used, a brief description of the methodology, and links to resulting statistical products. 

The report discusses the advantages and disadvantages of various options for locating the new entity, such as in a federal statistical agency, a federally funded R&D center, or a university-based public-private research center. Regardless of where the entity is established, federal statistical agencies should create partnerships with academia and external research organizations to develop the new methods needed for design and analysis using multiple data sources, the report says.

The report also offers recommendations about governance of the proposed new entity, noting that the governance structure will need to obtain input from all of the statistical agencies and address their needs.  The entity will also serve and have responsibilities to data providers and data users. Its director should report to a board of directors that includes representatives of the federal statistical agencies, experts on privacy, holders of data used by the entity, and users of statistical data. 

Recognizing that much research is needed before many federal statistical programs can incorporate multiple data sources, the committee recommended that the transition be gradual, taking place in phases to accommodate changes in system architectures, data access, and staffing. The report suggests that the first phase take place over the course of five years, after which a comprehensive review would assess the demonstrated benefits to federal statistics.

The study was sponsored by the Laura and John Arnold Foundation with additional support from the National Academy of Sciences Kellogg Fund.  The National Academies of Sciences, Engineering, and Medicine are private, nonprofit institutions that provide independent, objective analysis and advice to the nation to solve complex problems and inform public policy decisions related to science, technology, and medicine.  The National Academies operate under an 1863 congressional charter to the National Academy of Sciences, signed by President Lincoln.  For more information, visit http://national-academies.org

Kacey Templin, Media Relations Officer
Andrew Robinson, Media Relations Assistant
Office of News and Public Information
202-334-2138; e-mail news@nas.edu


Division of Behavioral and Social Sciences and Education
Committee on National Statistics
Panel on Improving Federal Statistics for Policy and Social Science Research
Using Multiple Data Sources and State-of-the-Art Estimation Methods

Robert M. Groves1,2 (chair)
Gerard Campbell Professor
Department of Mathematics and Statistics and Department of Sociology, and
Executive Vice President and Provost
Georgetown University
Washington, D.C.

Michael E. Chernew2
Department of Health Care Policy
Harvard Medical School

Piet Daas
Senior Methodologist
Department of Corporate Services, Information Technology, and Methodology, and
Data Scientist
Center for Big Data Statistics
Statistics Netherlands
Heerlen, Netherlands

Cynthia Dwork1,3
Gordon McKay Professor of Computer Science
Harvard Paulson School of Engineering,
Radcliffe Alumnae Professor
Radcliffe Institute for Advanced Study; and
Affiliated Faculty at Harvard Law School

Ophir Frieder
Robert L. McDevitt, K.S.G., K.C.H.S. and Catherine H. McDevitt, L.C.H.S. Chair in Computer Science and Information Processing
Georgetown University, and
Professor of Biostatistics, Bioinformatics, and Biomathematics
Georgetown University Medical Center
Washington, D.C.

Hosagrahar V. Jagadish
Bernard A. Galler Collegiate Professor of Electrical Engineering and Computer Science, and
Distinguished Scientist
Institute for Data Science
University of Michigan
Ann Arbor

Frauke Kreuter
Joint Program in Survey Methodology
University of Maryland
College Park;
Professor, Statistics and Methodology
University of Mannheim, Germany; and
Statistical Methods Group
German Institute for Employment Research
Nuremberg, Germany

Sharon Lohr
Vice President and Senior Statistician
Rockville, Md.

James P. Lynch
Professor and Chair
Department of Criminology and Criminal Justice
University of Maryland
College Park

Colm A. O’Muircheartaigh
Harris School of Public Policy Studies, and
Senior Fellow
National Opinion Research Center
University of Chicago

Trivellore Raghunathan
Survey Research Center Director and Research Professor
Institute for Social Research,
Professor of Biostatistics, and
Associate Director
Center for Research on Ethnicity, Culture, and Health
School of Public Health
University of Michigan
Ann Arbor; and
Research Professor, Joint Program in Survey Methodology
University of Maryland
College Park

Roberto Rigobon
Society of Sloan Fellows Professor of Management, and
Professor of Applied Economics
Sloan School of Management
Massachusetts Institute of Technology
Boston; and
Visiting Professor
Institute of Advanced Studies in Administration
Caracas, Venezuela

Marc Rotenberg
Electronic Privacy Information Center, and
Professor of Information Privacy Law and Open Government
Georgetown University Law Center
Washington, D.C.


Brian Harris-Kojetin
Staff Officer

1Member, National Academy of Sciences
2Member, National Academy of Medicine
3Member, National Academy of Engineering