Realtime Variant Warehouse

The engine behind Mosaic. A population-scale variant warehouse enabling complex annotation, phenotype, and genotype queries in realtime

From the creators of iobio.io

Can store 10's of thousands of whole exomes enabling powerful queries that cut across all of an organization's data.

Designed to support realtime visualization and analytics, our variant warehouse can execute most variant-focused queries in < 1 second. Unlike most solutions that rely on fixed sample-sets, ours is truly real-time, meaning the sample set can be defined on-the-fly or even be determined based on other criteria including phenotypes.

Create complex queries that contain genotype, phenotype, annotations, and meta-data criteria. For example: give me all the variants in ((gene A) or (gene B) or (random region C)) that have a gnomAD allele frequency less than 0.01, are labeled as 'Pathogenic' in ClinVar, and are SNPs. Now let me sort by AtlernateAlleleFrequency. Now let me select the samples that are homozygous for these variants from individuals who are smokers and older than 60.

Add your custom annotations as well as host of industry-standard annotations without sacrificing scale or speed.

ClinVar Significance
Gene Consequence
Gene Impact
Gene Symbol
gnomAD AF Popmax

gnomAD Allele Count
gnomAD Allele Frequency
gnomAD Allele Number
gnomAD Homozygous Count
gnomAD Popmax

The variant warehouse is built on top of Postgres, which means its rock-solid, secure, and has millions of developer-hours behind it. The ecosystem of plugins and documentation that are available for Postgres allows the variant warehouse to be extremely extensible and can be customized to fit your organization's requirements now and in the future. Additonally, the variant warehouse is (1) transactional, which helps with ensuring complicated data operations or things over a flaky/distributed network don't leave the system in a bad state; (2) has a robust multi-tenant model built in, so there won't be concurrency issues, and (3) it's SQL so it should be easy to integrate with existing pipelines, workflows, and applications.

Having Postgres as a foundation means it is easy to run locally, on the cloud, or even as a managed cloud service (e.g. AWS RDS). The Variant Warehouse is relatively fast to load, taking on the same order of time that it takes to fully read the VCF file. It requires extra disk space as it is optimized for speed, whereas VCF files are optimized for disk usage. Although the variant warehouse will require much more space than the related VCF files, the increase of disk usage for your entire project (including alignment data) will be relatively small (< 5%).