The National Cancer Institute’s Genomic Data Commons (GDC), started in 2016 by then-vice president Joseph Biden and hosted at the University of Chicago, has become one of the largest and most widely used resources in cancer genomics, with over 3.3 petabytes of data from over 65 projects and over 84,000 anonymized patient cases, serving over 50,000 unique users each month.
In new articles published on February 22 in Nature’s communications and Genetics of nature, the UChicago-based research team shares new details about the GDC, which is funded by the National Cancer Institute (NCI), through a subcontract with the Frederick National Laboratory for Cancer Research, currently operated by Leidos Biomedical Research, Inc. describes the design and operation of the GDC. The other describes the pipelines used by the GDC for the harmonization of data submitted to the GDC and the generation of datasets used by the GDC research community.
The goal of the GDC is to provide the cancer research community with a repository of uniformly processed genomics data and associated clinical data that enables data sharing and collaborative analysis in support of precision medicine. .
Data production for what would become GDC began in June 2015 using a private cloud. After just one year, the GDC had analyzed over 50,000 raw sequencing data entries. The GDC includes genomic, transcriptomic, epigenomic, proteomic, clinical and imaging data. The treatment pipelines described in the Nature article have produced more than 1,660 TB of data on more than two dozen types of primary cancers. This data is stored in the GDC data portal, where it can be viewed and downloaded.
In addition to the Data Portal, GDC also offers additional user resources, including GDC Analysis, Visualization and Data Mining (DAVE) tools for interactive data mining by genomic variant or specific alteration; the GDC data submission portal for data submission; GDC Data Transfer Tool (DTT) for uploading large genomic datasets; and the GDC Data Harmonization System, which allows users to run data submitted to GDC through the harmonization processing pipelines.
“This data has a critical role to play,” said Robert Grossman, Ph.D., senior researcher for GDC and director of the Center for Translational Data Science at UChicago. “As data accumulates, new signals will become easier to identify as important targets for understanding cancer biology. In addition, the data-sharing infrastructure can be used to inform research studies, providing new information on genetic variation between individuals and how it may affect cancer patients outcomes. ”
University of Chicago’s Genomic Data Commons ushers in a new era of cancer data sharing
Zhenyu Zhang et al, Analysis of Uniform Genomic Data in the NCI Genomic Data Commons, Nature’s communications (2021). DOI: 10.1038 / s41467-021-21254-9
Allison P. Heath et al. The NCI Genomic Data Commons, Genetics of nature (2021). DOI: 10.1038 / s41588-021-00791-5
Quote: Genomic Data Commons Provides Unprecedented Cancer Data Resource (2021, February 26) retrieved February 27, 2021 from https://medicalxpress.com/news/2021-02-genomic-commons-unprecedented-cancer-resource. html
This document is subject to copyright. Other than fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for information only.