Requirements for Secondary Data Return Plan (SDRP)

Introduction

In alignment with the NIA’s commitment to making genetic and genomic data widely available to researchers and in compliance with the NIH Genomic Data Sharing (GDS) Policy, the NIA requires all derived secondary data and analysis results gleaned from research as the result of accessing and utilizing data from NIAGADS to be submitted to NIAGADS.

Registering with dbGaP

All studies generating human genomic data that fall within the scope of the NIH GDS Policy must first register in the Database of Genotypes and Phenotypes (dbGaP)—NIH's central repository for human genomic and associated phenotypic data—even if the data will be submitted elsewhere.

Certificate of Confidentiality

The NIH encourages non-NIH funded investigators and institutions submitting large-scale human genomic datasets to dbGaP to seek a Certificate of Confidentiality as an additional safeguard to prevent compelled disclosure of any personally identifiable information they may hold. NIH-funded studies are automatically covered under such a certificate.

These measures accelerate biomedical research discovery, enhance research rigor and reproducibility, provide accessibility to high-value datasets, and promote data reuse for future research studies resulting from NIAGADS data access and use.

As a component of being authorized to access and use NIAGADS datasets by submitting a Data Access Request application, investigators will be required to prepare and submit a Derived/Secondary Data Return Plan (SDRP). Instructions and examples can be found below. “Derived data” in this context are defined as the data generated through your use of data accessed through NIAGADS.

Instructions

Your Derived/SDRP should include the following in alignment with the Research Use Statement described in the Data Access Request application:

Please label the top of the document with Secondary Data Return Plan and include the investigator’s name and project title
Anticipated work products
1. Include
  1. Type/s of data that will be submitted to NIAGADS
  2. If source code(s) will need to be submitted
A statement regarding compliance with NIA and NIH genomic data sharing policies and commitment to cooperating with NIAGADS to ensure timely submission of your derived secondary data and analysis results to the researcher community:

We are committed to making derived secondary data and analysis results available to the research community after publication. We will follow the policy outlined in NIA Alzheimer’s Disease Sharing Policy document http://www.nia.nih.gov/research/dn/alzheimers-disease-genetics-sharing-plan / and NIAGADS Data Distribution Agreement

We will contact NIAGADS to start the process of submitting derived data when a publication has been accepted or patent has been filed, so the data will become available to the public in a timely fashion that is in compliance with the NIH Genomics Sharing Policy (GDS): https://grants.nih.gov/grants/guide/notice-files/NOT-OD-14-124.html

Please see examples below, however, please note this represents a sampling of a potential SDRP and is not all encompassing. Examples below should be used as a starting point and modified to fit your own research needs.

Please reach out to niagads@pennmedicine.upenn.edu should you need assistance.

Examples

1. Somatic variant analysis 1

Work Products:

We will submit to NIAGADS:

Somatic variant analysis: all called mutations (SNVs, indels, SVs) in VCF or other commonly used file format, including genotypes if available.
Bioinformatic annotations of variants that are used in published analysis
Pathway analysis: significance and summary statistics of all pathways/gene annotation terms tested

2. Somatic variant analysis 2

Work Products:

Case/control or endophenotype association studies: genome-wide association significance, allele frequency, and summary statistics
Family studies: genome-wide linkage/IBD/co-segregation significance and summary statistics
Structural variant analysis: called SVs in VCF or other commonly used file format, including genotypes if available. When SVs are used for association/linkage studies, we will provide genome-wide summary statistics outlined in items 1 and 2.
Bioinformatic annotations of variants that are used in published analysis
Pathway analysis: significance and summary statistics of all pathways/gene annotation terms tested
Individual level phenotypes/endophenotypes/gene expression levels/covariates not provided by ADSP and are required to replicate published findings
Published population structure estimates, genome-wide genotype imputations, and phased haplotypes.
If sequencing data are processed using a different workflow to generate SNP and indel variant calls and used in publications, we will provide all called variants in VCF format.

3. Whole-genome association study

Work Products:

We will submit to NIAGADS:

Whole-genome association study results: summary statistics of genetic variants, including p-values, effect sizes, and allele frequencies.
Regional plots: visual representations of genomic regions showing associated variants, annotated with relevant genomic features.
Manuscript with detailed methods and findings: a comprehensive document outlining the study design, statistical analyses, and interpretation of results.
Code sharing: R and Python scripts used for data preprocessing, quality control, and statistical analysis to ensure transparency and reproducibility.

4. Genome-wide array study

Work Products:

We will submit to NIAGADS:

Genome-wide array analysis results: summary statistics of SNP associations, including p-values and effect sizes.
Population-specific analyses: stratified results for different demographic groups to explore potential variations in genetic associations.
Interactive data visualization tools: web-based tools for exploring the genome-wide array results, enhancing accessibility for researchers with varying levels of expertise.
Documentation of quality control procedures: detailed protocols and reports ensuring the reliability and validity of the analyzed genome-wide array data.