Data Submission

This page contains information about the process and documentation necessary to submit data to NIAGADS. Depending on the data size, a member from NIAGADS will work with you on data transfer. Contact help@niagads.org to deposit data or if you have any questions.

For details around applicable storage costs at NIAGADS, please see: NIAGADS Data Storage Cost Estimates

Required Policy Documents

Please email the following required documents to help@niagads.org in order to deposit and share your data:

Institutional Certification for ADRD Studies that covers all subjects in your study. Multiple certifications may be required.
Signed copy of the NIA AD Genomics Sharing Plan.
Data Registration Template

All documents related to the application should be provided in English. For institutions where English is not the primary language, please provide translations of documents along with the original document. Translated documents should be signed by the institutional signing official.

Data Submission Checklist

In addition to the documentation above, all data-submissions must include the following:

md5 checksum for every submitted data file
README in plain text (.txt), PDF (.pdf), or Microsoft Word (.doc or .docx)

Description of the dataset and concise description of the study design
Platform or array
Any version information
List of included files and formats, and data dictionary
Contributor contact information
Dataset Reference Genome Build
Publications

Requirements for additional documentation for each datatype are located in the drop downs below. If you do not see your datatype listed below, please contact us at help@niagads.org for assistance.

Genotype

Click here to expand...

Phenotype Data File in tab delimited format (including pedigree structures if applicable)
APOE Genotypes (if applicable)
Genotypes in PLINK or VCF file format
Consent level as specified in the Institutional Certification form for each subject
List of cohorts included and a description for each

Summary statistics/association results

Click here to expand...

Results files in .txt format

Whole genome or whole exome sequencing

Click here to expand...

Sequencing read data can be submitted in any of formats:

FASTQ: please save all reads, including those that could not be mapped to the reference genome.
BAM: please save all reads, including those that could not be mapped to the reference genome.
CRAM: please save all reads, including those that could not be mapped to the reference genome.
VCF: standard VCF4.2 format (recommend split by chr and gz these)

Sequencing center
Sequencer machine
Read length
PCR Free or PCR Amplified?
Kit Name and version
Copy of the WES target regions (if applicable)
Sequencing quality control metrics
Phenotype Data File in tab delimited format (including pedigree structures if applicable)
APOE Genotypes (if applicable)
Genotypes in PLINK or VCF file format
Consent level as specified in the Institutional Certification form for each subject
List of cohorts included and a description for each

RNA-seq- or microarray data

Sequencing read data

Sequencing read data can be submitted in any of the following formats: FASTQ, or BAM. The BAM file should contain all reads, including those that could not be mapped to the reference genome
Phenotype Data File in tab delimited format (including pedigree structures if applicable)
In README File:

Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
RNA extraction protocol (e.g. Trizol/chloroform extraction, Qiagen RNeasy kit)
RNA integrity (RIN number) per sample
Library preparation protocol (i.e. polyA capture, adapters used for ligation, read length and sequencing machine, single cell platform)

Consent level as specified in the Institutional Certification form for each subject
List of cohorts included and a description for each
OPTIONAL: QC report per sample (i.e. library characteristics (total number of reads, sequencing read length), GC content, % of rRNAs, % of Aligned reads, coverage, insert size)

Summary Data

Read abundance files can be submitted as summaries in tab-separated file format with explanations
Phenotype Data File in tab delimited format (including pedigree structures if applicable)
In README File:

Sample source and organism; provide protocol details if iPSCs
How the RAW data was generated and processed (steps needed, e.g., how mapping was done, how was multi-mapping handled)
Raw data and library preparation protocol information (e.g., polyA capture, sequencing machine)
Unit of quantification in these summary files (e.g., genes, exons, etc.)
Annotation source and version (e.g., ENSEMBL version 94)
Unit of counts (e.g., raw counts, RPKM values, UMI counts). Please provide details if normalization were performed, technical variations/batch effects were accounted for
Software name and version used to generate those counts

OPTIONAL: QC report per sample (i.e. library information (total number of reads, sequencing read length), GC content, % of rRNAs, % of Aligned reads, coverage, insert size)
OPTIONAL: Highly recommend to send the workflow via code repository (e.g. github, bitbucket)

Epigenetics studies (e.g., ChIP-seq, ATAC-seq)

Sequencing Read Data

Sequencing read data can be submitted in any of the following formats: FASTQ, or BAM. Save all reads, including those that could not be mapped to the reference genome. Besides, must include background samples (input or mock IP samples)
Phenotype Data File in tab delimited format (including pedigree structures if applicable)
In README File:

Sample source and organism; provide protocol details if iPSCs
Library preparation protocol (i.e. adapters used for ligation, read length and sequencing machine)

Consent level as specified in the Institutional Certification form for each subject
List of cohorts included and a description for each
OPTIONAL: QC report per samples (i.e. Library size (total number of reads), GC content, % of Aligned reads, coverage, insert size)

Summary Data

Processed peak files can be submitted in BED format with explanations (including significance of called peaks)
Phenotype Data File in tab delimited format (including pedigree structures if applicable)
In README File:

Sample source and organism; provide protocol details if iPSCs
Description of all the BED columns
Software name and version used to make those values (e.g. how do you filter the reads before calling peaks, was narrow or broad peaks called, how was the p-value corrected if any)

OPTIONAL: QC report per samples (i.e. Library size (total number of reads), GC content, % of Aligned reads, coverage, insert size)
OPTIONAL: Highly recommend to send the workflow via some code repository (e.g. github, bitbucket)

Quantitative trait locus (QTL) analysis summary stats

Click here to expand...

Variant position: chr, start, end
Allele information: ref, alt, a1, a2
Feature name (e.g. gene name, protein name)
P-value and/or Q-value
Effect size (Beta and Beta SE), or Spearman correlation p value
OPTIONAL: Allele frequency or allele count
OPTIONAL: Feature location: chr, start, end
OPTIONAL: Cis/trans
OPTIONAL: in README file:

Detailed sample source, molecular trait and organism; provide protocol details if iPSCs
Description of all the columns
Software name and version used to perform the analyses

For RNA-seq- or microarray data (including single-cell data)

Sequencing Read Data

Sequencing read data can be submitted in any of the following formats: FASTQ, or BAM. If submitting BAM, save all reads, including those that could not be mapped to the reference genome
Phenotype Data File in tab-delimited format (including pedigree structures if applicable)
In README File:

Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
RNA extraction protocol (e.g. Trizol/chloroform extraction, Qiagen RNeasy kit)
RNA integrity (RIN number) per sample
Library preparation protocol (i.e. polyA capture, adapters used for ligation, read length and sequencing machine, single cell platform)

OPTIONAL: If BAM file, how was the data processed (e.g. how mapping was done, how was multi-mapping handled)
OPTIONAL: QC report per sample (i.e. library characteristics (total number of reads, sequencing read length), GC content, % of rRNAs, % of Aligned reads, coverage, insert size)
OPTIONAL: Highly recommended to send the workflow via some code repository (e.g., GitHub, Bitbucket)

Summary/Processed data

Read abundance files can be submitted as summaries in tab-separated file format with explanations
Phenotype Data File in tab delimited format (including pedigree structures if applicable)
In README File:

Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
How the RAW data was generated and processed (steps needed, e.g., how mapping was done, how was multi-mapping handled)
Raw data and library preparation protocol information (e.g., polyA capture, sequencing machine, single cell platform)
Unit of quantification in these summary files (e.g., genes, exons, etc.)
Annotation source and version (e.g., ENSEMBL version 94)
Unit of counts (e.g., raw counts, RPKM values, UMI counts). Provide details if normalization were performed, technical variations / batch effects were accounted for
Software name and version used to generate those counts

OPTIONAL: QC report per sample (i.e. library information (total number of reads, sequencing read length), GC content, % of rRNAs, % of Aligned reads, coverage, insert size)
OPTIONAL: Highly recommend to send the workflow via code repository (e.g. Github, bitbucket)

For epigenetics studies (e.g., ChIP-seq, ATAC-seq) (including single-cell data)

Sequencing Read Data

Sequencing read data can be submitted in any of the following formats: FASTQ, or BAM. If submitting BAM, save all reads, including those that could not be mapped to the reference genome. Besides, must include background samples (input or mock IP samples)
Phenotype Data File in TSV (tab delimited) format (including pedigree structures if applicable)
In README File:

Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
Library preparation protocol (i.e. adapters used for ligation, read length and sequencing machine)

OPTIONAL: QC report per sample (e.g., library size (total number of reads), GC content, % of uniquely aligned reads, coverage, insert size)
OPTIONAL: Highly recommended to send the workflow via some code repository (e.g., GitHub, Bitbucket)

Summary/Processed data

Processed peak call files can be submitted in BED format with explanations (including significance of called peaks)
For ATAC-seq or similar protocols, fragment files (in BED format) can be submitted
Phenotype Data File in TSV (tab delimited) format (including pedigree structures if applicable)
In README File:

Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
Description of all the BED columns
Software name and version used to make those values and processing details (e.g., how reads were filtered before calling peaks, was narrow or broad peaks called, how was the p-value corrected)

OPTIONAL: QC report per samples (e.g, library size (total number of reads), GC content, % of uniquely aligned reads, coverage, insert size)
OPTIONAL: Highly recommended to send the workflow via some code repository (e.g., GitHub, Bitbucket)

For Methylation data (e.g., methylation array, bisulfite sequencing)

Sequencing Read Data / Raw Methylation Data

Sequencing read data can be submitted in any of the following formats: FASTQ, or BAM. If submitting BAM, save all reads, including those that could not be mapped to the reference genome
Phenotype Data File in TSV (tab delimited) format (including pedigree structures if applicable)
In README File:

Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
Library preparation protocol (i.e. adapters used for ligation, read length and sequencing machine)

OPTIONAL: QC report per sample (e.g., library size (total number of reads), % of uniquely aligned reads, coverage)
OPTIONAL: Highly recommended to send the workflow via some code repository (e.g., GitHub, Bitbucket)

Summary/Processed data

Processed methylation sites/peak call files can be submitted in BED format with explanations (including significance of called peaks)
Phenotype Data File in TSV (tab delimited) format (including pedigree structures if applicable)
In README File:

Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
Description of all the BED columns
Software name and version used to make those values and processing details (e.g., how reads were filtered before calling peaks, was narrow or broad peaks called, how was the p-value corrected)

OPTIONAL: QC report per samples (e.g, library size (total number of reads), GC content, % of uniquely aligned reads, coverage, insert size)
OPTIONAL: Highly recommended to send the workflow via some code repository (e.g., GitHub, Bitbucket)

Proteomics data

Mass Spec Related

Files in one of the standard Mass Spectrometer Output File Format e.g. mzML, mzXML
A matrix of samples against peptide/protein information in txt format
Phenotype Data File in TSV (tab delimited) format (including pedigree structures if applicable)
In README file:

Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
Quantification method (e.g. Label-free: intensity, TMT quantitation analysis)
Digestion Method (e.g. In-solution digestion, on-bead digestion)
Online LC system (e.g. Agilent 1100- nano LC system, Agilent HPLC 1200 system, Dionex UltiMate 3000)
Mass Spectrometer (e.g. LTQ Orbitrap, LTQ Orbitrap Velos, Q Exactive HF)
Protease (e.g. Trypsin)
Fragmentation method (e.g. CID resonance-type, CID beam-type, high-energy collision-induced dissociation
Peptide identification and annotation; protein annotation information
QC/normalization details and steps involved (including outlier detection)

OPTIONAL: QC report per sample

Protein Array Data

Read abundance files can be submitted as summaries in tab-separated file format with explanations
Provide UniprotID and Target protein name measured
For SOMAscan, provide SOMAScan RFU values (recommend both raw and processed)
For Olink, provide NPX values (recommend both raw and processed)
Phenotype Data File in tab delimited format (including pedigree structures if applicable)
In README file:

Sample source (e.g., cell line, tissue, cell types) and organism; provide protocol details if iPSCs/Single cells
How the RAW data was generated (protein array platform, chip version)
Unit of quantification in these summary files (e.g., proteins.)
Annotation source and version (e.g., uniprot version xx)
Unit of counts (e.g., raw counts, RPKM values, UMI counts). Please provide details if normalization were performed, technical variations/batch effects were accounted for
Software name and version used to generate those counts

OPTIONAL: QC report per sample
OPTIONAL: Highly recommend to send the workflow via code repository (e.g. Github, bitbucket)