The AIRR Data Commons

One of the goals of both the AIRR Community and the iReceptor Project is to make it easy for researchers to find, share, compare, and reuse AIRR-seq data (antibody/B-cell and T-cell receptor repertoires). Two main initiatives developed by the AIRR Community support these goals. First, the MiAIRR Standard is a set of standards and protocols for curating and sharing these complicated and immense repertoire repositories. Second, the AIRR Data Commons (ADC) is a distributed system of AIRR-seq data repositories that follow these standards, thus utilizing a common data model, a common query language, and common interoperability formats for storage, query, and downloading of AIRR-seq data. The iReceptor project provides a science gateway that makes that sharing a reality for our users. Our implementation consists of three fundamental components:

  1. a database software stack for storing AIRR-seq data (the iReceptor Turnkey Repository),
  2. a web based API for querying such an AIRR-seq repository (the iReceptor Web API), and
  3. the iReceptor Scientific Gateway, a scientific web portal which can query one or more distributed AIRR-seq repositories to find, explore, and analyze the data in those repositories.

Rather than create a single, large repository, our goal is to federate many large, distributed repositories. The two driving reasons for the distributed repository approach are:

  1. It is not practical to expect a single central repository to scale to the size that will be required by the continually growing number of AIRR-seq studies and the amount of AIRR-seq data that those studies generate.
  2. Due to data privacy and ethics requirements, data stewards need to control and manage their own data and will be reluctant to upload their data to an external repository.

Rather than a small number of large repositories, we envision many (10s or 100s) of institutional AIRR-seq data repositories, each managed and controlled by a local data steward, connected together in what we call the “AIRR Data Commons”.