Understanding Accession Numbers
Overview
NIAGADS utilizes 4 sets of accession numbers. These unique identifiers help researchers identify and navigate the data and metadata available:
Dataset Accession Number - NGXXXXX
Dataset accession numbers correspond to the data for a particular set of experiments. A dataset can be made up of 1 or more studies or sample sets. When submitting a data access request in DSS, researchers apply for access to data using the dataset accession number.
Study Accession Number - saXXXXXX
Study accession numbers correspond to the overarching study that datasets are part of. A study can be associated with one or more datasets. Datasets can be linked to more than one study if multiple studies were involved in generating the data.
Sample Set Accession Number - sndXXXXX
A sample set accession number is used as an identifier for a particular set of samples within a dataset. Often it corresponds to a set of samples run on a particular platform in a particular batch. More than one sample set can be included within a dataset if different assays or platforms were used to generate the data in the dataset.
A dataset may not have any sample sets if it does not have any sample-level data (i.e. all files are aggregate-level data or summary statistics).
Fileset Accession Number - fsaXXXXXX
A fileset accession number is a way to group related files together, and can be used to:
Differentiate between files that are part of different experiments or analysis in the same dataset.
Differentiate between public and private access files.
Use the fileset accession number to narrow down the files you are looking for in the DSS.
What Can I Find on Each Accession Page?
Each accession type, excluding fileset, corresponds to its own summary page that will give information about the dataset, sample set, or study. See a summary below of the type of information that can be found on each page type.
How Do I Find an Accession Number?
Below are screenshots and instructions on how to find the different accession numbers.
All accession numbers associated with a dataset can always be found in the dataset release notes.
Dataset
Dataset accession numbers can be found in the following places:
Study
Study accession numbers can be found in the following places:
Sample
Sample set accession numbers can be found in the following places:
Fileset
Fileset accession numbers can be found in the following places:
How Do I Use Accession Numbers in the DSS?
Once you have identified an accession number you are interested in downloading the data for, the accession number can then be used in the different tabs of the DSS to help you narrow down and select the files you are interested in downloading. The screenshots below show how to use the different accession numbers in the subject information, gVCF, CRAMs, phenotype files, genotype files, and miscellaneous files tabs.
Subject Information Tab
Single Sample Tab
Contains individual level files (CRAMs, CRAM Indexes, VCF, gVCF, VCF indexes, gVCF indexes).
Multi-Sample and Metadata Tab
Contains files that cover multiple samples or metadata for datasets (pVCF companion files, QC metrics, phenotype files, genotype files, WES target regions, ect).