MOAT: Mutations Overburdening Annotations Tool

MOAT (Mutations Overburdening Annotations Tool) is a computational system for identifying significant mutation burdens in genomic elements with an empirical, nonparametric method. Taking a set of variant calls and a set of annotations, both of which may be drawn from anywhere in the human genome, MOAT calculates the observed mutation counts of each annotation, and compares them to the expected mutation counts to detect elevated mutation burdens. The expected mutation count is derived by simulating the expected distribution of background mutations. To produce this expected distribution, MOAT offers two types of permutation algorithm: one that permutes the locations of annotations (MOAT-a), and one that permutes the locations of variants (MOAT-v).

MOAT’s annotation permutation algorithm was amenable to parallelization on graphics processing units (GPUs) using Nvidia’s CUDA framework due to its high computational intensity and low memory requirements. MOAT’s variant permutation algorithm, however, required importing the human reference genome into memory, which made it better suited to parallelization across multiple CPUs with the OpenMPI framework.

Additionally, we provide a variant distribution simulator called MOATsim, which produces permutations of the input variants taking into account trinucleotide identity preservation (similar to MOAT-v), as well as the distribution of whole genome covariates that influence the background mutation rate. Whole genome regions are clustered into groups with similar covariate signal profiles, and intersecting input variants are permuted within all regions in the same cluster, with their new locations preserving the original trinucleotide identity. MOATsim’s parallel implementation utilizes the OpenMPI framework.

Furthermore, MOAT offers the ability to use a whole genome signal track to compute the input variants’ signal scores, and aggregate them into annotation signal scores. This is done alongside the permutation algorithms, enabling users to gauge the statistical significance of an elevated annotation signal score, in addition to mutation burden. We have released one such signal track derived from Funseq2 (funseq2.gersteinlab.org), a framework for evaluating the functional impact of single nucleotide variants.

Download MOAT source code (zip archive, 614 KB)

Download Funseq2 whole genome signal track (compressed bigWig, 7.3 GB)

Download reference genome FASTA files (.tar.gz, 905 MB)
(Required for MOAT-v and MOAT-sim)