Functions

  • binomTest(x: Int, n: Int, p: Double, alternative: String): Double

    Returns the p-value from the exact binomial test of the null hypothesis that success has probability p, given x successes in n trials.

    Examples

    Test each variant for allele balance across all heterozygous genotypes, under the null hypothesis that the two alleles are sampled with equal probability.

    >>> (vds.split_multi()
    ...   .annotate_variants_expr(
    ...   'va.ab_binom_test = let all_samples_ad = gs.filter(g => g.isHet).map(g => g.ad).sum() in '
    ...   'binomTest(all_samples_ad[1], all_samples_ad.sum(), 0.5, "two.sided")'))
    

    Arguments

    • x (Int) – Number of successes
    • n (Int) – Number of trials
    • p (Double) – Probability of success under the null hypothesis
    • alternative (String) – Alternative hypothesis, must be “two.sided”, “greater” or “less”.
  • chisq(c1: Int, c2: Int, c3: Int, c4: Int): Struct{pValue:Double,oddsRatio:Double}

    • pValue (Double) – p-value
    • oddsRatio (Double) – odds ratio

    Calculates p-value (Chi-square approximation) and odds ratio for 2x2 table

    Arguments

    • c1 (Int) – value for cell 1
    • c2 (Int) – value for cell 2
    • c3 (Int) – value for cell 3
    • c4 (Int) – value for cell 4
  • combineVariants(left: Variant, right: Variant): Struct{variant:Variant,laIndices:Dict[Int,Int],raIndices:Dict[Int,Int]}

    • variant (Variant) – Resulting combined variant.
    • laIndices (Dict[Int, Int]) – Mapping from new to old allele index for the left variant.
    • raIndices (Dict[Int, Int]) – Mapping from new to old allele index for the right variant.

    Combines the alleles of two variants at the same locus, making sure that ref and alt alleles are represented uniformely. In addition to the resulting variant containing all alleles, this function also returns the mapping from the old to the new allele indices. Note that this mapping counts the reference allele always contains the reference allele mapping 0 -> 0.

    let left = Variant("1:1000:AT:A,CT") and right = Variant("1:1000:A:C,AGG") in combineVariants(left,right)
    result: Struct{'variant': Variant(contig=1, start=1000, ref=AT, alts=[AltAllele(ref=AT, alt=A), AltAllele(ref=AT, alt=CT), AltAllele(ref=AT, alt=AGGT)]), 'laIndices': {0: 0, 1: 1, 2: 2}, 'raIndices': {0:0, 2: 1, 3: 2}}
    

    Arguments

    • left (Variant) – Left variant to combine.
    • right (Variant) – Right variant to combine.
  • ctt(c1: Int, c2: Int, c3: Int, c4: Int, minCellCount: Int): Struct{pValue:Double,oddsRatio:Double}

    • pValue (Double) – p-value
    • oddsRatio (Double) – odds ratio

    Calculates p-value and odds ratio for 2x2 table. If any cell is lower than minCellCount Fishers exact test is used, otherwise faster chi-squared approximation is used.

    Arguments

    • c1 (Int) – value for cell 1
    • c2 (Int) – value for cell 2
    • c3 (Int) – value for cell 3
    • c4 (Int) – value for cell 4
    • minCellCount (Int) – Minimum cell count for using chi-squared approximation
  • Dict(keys: Array[T], values: Array[U]): Dict[T, U]

    Construct a Dict from an array of keys and an array of values.

    Arguments

    • keys (Array[T]) – Keys of Dict.
    • values (Array[U]) – Values of Dict.
  • dpois(x: Double, lambda: Double, logP: Boolean): Double

    Returns Prob(\(X\) = x) from a Poisson distribution with rate parameter lambda.

    Arguments

    • x (Double) – Non-negative number at which to compute the probability density.
    • lambda (Double) – Poisson rate parameter. Must be non-negative.
    • logP (Boolean) – If true, probabilities are returned as log(p).
  • dpois(x: Double, lambda: Double): Double

    Returns Prob(\(X\) = x) from a Poisson distribution with rate parameter lambda.

    Arguments

    • x (Double) – Non-negative number at which to compute the probability density.
    • lambda (Double) – Poisson rate parameter. Must be non-negative.
  • drop(s: Struct, identifiers: String*): Struct

    Return a new Struct with the a subset of fields not matching identifiers.

    let s = {gene: "ACBD", function: "LOF", nHet: 12} in drop(s, gene, function)
    result: {nHet: 12}
    

    Arguments

    • s (Struct) – Struct to drop fields from.
    • identifiers (String*) – Field names to drop from s. Multiple arguments allowed.
  • exp(x: Double): Double

    Returns Euler’s number e raised to the power of the given value x.

    Arguments

    • x (Double) – the exponent to raise e to.
  • fet(a: Int, b: Int, c: Int, d: Int): Struct{pValue:Double,oddsRatio:Double,ci95Lower:Double,ci95Upper:Double}

    • pValue (Double) – p-value
    • oddsRatio (Double) – odds ratio
    • ci95Lower (Double) – lower bound for 95% confidence interval
    • ci95Upper (Double) – upper bound for 95% confidence interval

    Calculates the p-value, odds ratio, and 95% confidence interval with Fisher’s exact test (FET) for 2x2 tables.

    Examples

    Annotate each variant with Fisher’s exact test association results (assumes minor/major allele count variant annotations have been computed):

    >>> (vds.annotate_variants_expr(
    ...   'va.fet = let macCase = gs.filter(g => sa.pheno.isCase).map(g => g.nNonRefAlleles()).sum() and '
    ...   'macControl = gs.filter(g => !sa.pheno.isCase).map(g => g.nNonRefAlleles()).sum() and '
    ...   'majCase = gs.filter(g => sa.pheno.isCase).map(g => 2 - g.nNonRefAlleles()).sum() and '
    ...   'majControl = gs.filter(g => !sa.pheno.isCase).map(g => 2 - g.nNonRefAlleles()).sum() in '
    ...   'fet(macCase, macControl, majCase, majControl)'))
    

    Notes

    fet is identical to the version implemented in R with default parameters (two-sided, alpha = 0.05, null hypothesis that the odds ratio equals 1).

    Arguments

    • a (Int) – value for cell 1
    • b (Int) – value for cell 2
    • c (Int) – value for cell 3
    • d (Int) – value for cell 4
  • filtering_allele_frequency(ac: Int, an: Int, ci: Double): Double

    Computes a filtering allele frequency (described below) for ac and an with confidence ci.

    The filtering allele frequency is the highest true population allele frequency for which the upper bound of the ci (confidence interval) of allele count under a Poisson distribution is still less than the variant’s observed ac (allele count) in the reference sample, given an an (allele number).

    This function defines a “filtering AF” that represents the threshold disease-specific “maximum credible AF” at or below which the disease could not plausibly be caused by that variant. A variant with a filtering AF >= the maximum credible AF for the disease under consideration should be filtered, while a variant with a filtering AF below the maximum credible remains a candidate. This filtering AF is not disease-specific: it can be applied to any disease of interest by comparing with a user-defined disease-specific maximum credible AF.

    For more details, see: Whiffin et al., 2017

    Arguments

    • ac (Int) – Allele count
    • an (Int) – Allele number
    • ci (Double) – Confidence interval. Should be between 0.0 and 1.0.
  • gamma(x: Double): Double

    Returns the value of the gamma function at x.

    Arguments

    • x (Double) – the input to gamma.
  • Genotype(v: Variant, c: Int, ad: Array[Int], dp: Int, gq: Int, pl: Array[Int]): Genotype

    Construct a Genotype object by specifying the variant, call, allelic depths, depth, genotype quality, and phred-scaled likelihoods.

    let v = Variant("7:76324539:A:G") and call = Call(0) and
      ad = [10, 0] and dp = 10 and gq = 20 and pl = [0, 10, 100] and
      g = Genotype(v, call, ad, dp, gq, pl) in g.isHomRef()
    result: true
    

    Arguments

    • v (Variant) – Variant
    • c (Int) – Call
    • ad (Array[Int]) – Allelic depths
    • dp (Int) – Depth
    • gq (Int) – Genotype quality
    • pl (Array[Int]) – Phred-scaled likelihoods
  • Genotype(v: Variant, gt: Int, prob: Array[Double]): Genotype

    Construct a Genotype object from a variant, a genotype call, and an array of genotype probabilities.

    let v = Variant("7:76324539:A:G") and gt = 0 and prob = [0.8, 0.1, 0.1] and
      g = Genotype(v, gt, prob) in g.isHomRef()
    result: true
    

    Arguments

    • v (Variant) – Variant
    • gt (Int) – Genotype call integer
    • prob (Array[Double]) – Genotype probabilities
  • Genotype(v: Variant, prob: Array[Double]): Genotype

    Construct a Genotype object from a variant and an array of genotype probabilities.

    let v = Variant("7:76324539:A:G") and prob = [0.2, 0.7, 0.1] and
      g = Genotype(v, prob) in g.isHet()
    result: true
    

    Arguments

    • v (Variant) – Variant
    • prob (Array[Double]) – Genotype probabilities
  • gtIndex(j: Int, k: Int): Int

    Convert from j/k pair to genotype index (triangular numbers).

    Arguments

    • j (Int) – j in j/k pairs.
    • k (Int) – k in j/k pairs.
  • gtj(i: Int): Int

    Convert from genotype index (triangular numbers) to j/k pairs. Returns j.

    Arguments

    • i (Int) – Genotype index.
  • gtk(k: Int): Int

    Convert from genotype index (triangular numbers) to j/k pairs. Returns k.

    Arguments

    • k (Int) – Genotype index.
  • hwe(nHomRef: Int, nHet: Int, nHomVar: Int): Struct{rExpectedHetFrequency:Double,pHWE:Double}

    • rExpectedHetFrequency (Double) – Expected rHeterozygosity based on Hardy Weinberg Equilibrium
    • pHWE (Double) – P-value

    Compute Hardy Weinberg Equilbrium (HWE) p-value.

    Examples

    Compute HWE p-value per variant:

    >>> (vds.annotate_variants_expr('va.hwe = '
    ...     'let nHomRef = gs.filter(g => g.isHomRef()).count().toInt() and '
    ...     'nHet = gs.filter(g => g.isHet()).count().toInt() and '
    ...     'nHomVar = gs.filter(g => g.isHomVar()).count().toInt() in '
    ...     'hwe(nHomRef, nHet, nHomVar)'))
    

    Notes

    Hail computes the exact p-value with mid-p-value correction, i.e. the probability of a less-likely outcome plus one-half the probability of an equally-likely outcome. See this document for details on the Levene-Haldane distribution and references.

    Arguments

    • nHomRef (Int) – Number of samples that are homozygous for the reference allele.
    • nHet (Int) – Number of samples that are heterozygotes.
    • nHomVar (Int) – Number of samples that are homozygous for the alternate allele.
  • index(structs: Array[Struct], identifier: String): Dict[String, Struct]

    Returns a Dict keyed by the string field identifier of each Struct in the Array and values that are structs with the remaining fields.

    let a = [{PLI: 0.998, genename: "gene1", hits_in_exac: 1},
             {PLI: 0.0015, genename: "gene2", hits_in_exac: 10},
             {PLI: 0.9045, genename: "gene3", hits_in_exac: 2}] and
        d = index(a, genename) in global.gene_dict["gene1"]
    
    result: {PLI: 0.998, hits_in_exac: 1}
    
  • Interval(chr: String, start: Int, end: Int): Interval

    Constructs an interval from a given chromosome, start, and end.

    Arguments

    • chr (String) – Chromosome.
    • start (Int) – Starting position.
    • end (Int) – Ending position (exclusive).
  • Interval(s: String): Interval

    Returns an interval parsed in the same way as parse()

    Arguments

    • s (String) – The string to parse.
  • Interval(startLocus: Locus, endLocus: Locus): Interval

    Construct a Interval object. Intervals are left inclusive, right exclusive. This means that [chr1:1, chr1:3) contains chr1:1 and chr1:2.

    Arguments

    • startLocus (Locus) – Start position of interval
    • endLocus (Locus) – End position of interval
  • isDefined(a: T): Boolean – Returns true if item is non-missing. Otherwise, false.

  • isMissing(a: T): Boolean – Returns true if item is missing. Otherwise, false.

  • isnan(a: Double): Boolean – Returns true if the argument is NaN (not a number), false if the argument is defined but not NaN. Returns missing if the argument is missing.

  • json(x: T): String – Returns the JSON representation of a data type.

  • Locus(contig: String, pos: Int): Locus

    Construct a Locus object.

    let l = Locus("1", 10040532) in l.position
    result: 10040532
    

    Arguments

    • contig (String) – String representation of contig.
    • pos (Int) – SNP position or start of an indel.
  • Locus(s: String): Locus

    Construct a Locus object.

    let l = Locus("1:10040532") in l.position
    result: 10040532
    

    Arguments

    • s (String) – String of the form CHR:POS
  • log(x: Double, b: Double): Double

    Returns the base b logarithm of the given value x.

    Arguments

    • x (Double) – the number to take the base b logarithm of.
    • b (Double) – the base.
  • log(x: Double): Double

    Returns the natural logarithm of the given value x.

    Arguments

    • x (Double) – the number to take the natural logarithm of.
  • log10(x: Double): Double

    Returns the base 10 logarithm of the given value x.

    Arguments

    • x (Double) – the number to take the base 10 logarithm of.
  • merge(s1: Struct, s2: Struct): Struct

    Create a new Struct with all fields in s1 and s2.

    let s1 = {gene: "ACBD", function: "LOF"} and s2 = {a: 20, b: "hello"} in merge(s1, s2)
    result: {gene: "ACBD", function: "LOF", a: 20, b: "hello"}
    
  • orElse(a: T, b: T): T

    If a is not missing, returns a. Otherwise, returns b.

    Examples

    Replace missing phenotype values with the mean value:

    >>> [mean_height] = vds.query_samples(['samples.map(s => sa.pheno.height).stats()'])['mean']
    >>> vds.annotate_samples_expr('sa.pheno.heightImputed = orElse(sa.pheno.height, %d)' % mean_height)
    
  • orMissing(a: Boolean, b: T): T – If predicate evaluates to true, returns value. Otherwise, returns NA.

  • pchisqtail(x: Double, df: Double): Double

    Returns right-tail probability p for which p = Prob(\(Z^2\) > x) with \(Z^2\) a chi-squared random variable with degrees of freedom specified by df. x must be positive.

    Arguments

    • x (Double) – Number at which to compute the probability.
    • df (Double) – Degrees of freedom.
  • pcoin(p: Double): Boolean

    Returns true with probability p. This function is non-deterministic.

    Arguments

    • p (Double) – Probability. Should be between 0.0 and 1.0.
  • pnorm(x: Double): Double

    Returns left-tail probability p for which p = Prob(\(Z\) < x) with \(Z\) a standard normal random variable.

    Arguments

    • x (Double) – Number at which to compute the probability.
  • pow(b: Double, x: Double): Double

    Returns b raised to the power of x.

    Arguments

    • b (Double) – the base.
    • x (Double) – the exponent.
  • ppois(x: Double, lambda: Double, lowerTail: Boolean, logP: Boolean): Double

    If lowerTail equals true, returns Prob(\(X \leq\) x) where \(X\) is a Poisson random variable with rate parameter lambda. If lowerTail equals false, returns Prob(\(X\) > x).

    Arguments

    • x (Double) – Non-negative number at which to compute the probability density.
    • lambda (Double) – Poisson rate parameter. Must be non-negative.
    • lowerTail (Boolean) – If false, returns the exclusive right-tail probability \(P(X > x)\).
    • logP (Boolean) – If true, probabilities are returned as log(p).
  • ppois(x: Double, lambda: Double): Double

    Returns the left-tail Prob(\(X \leq\) x) where \(X\) is a Poisson random variable with rate parameter lambda.

    Arguments

    • x (Double) – Non-negative bound for the left-tail cumulative probability.
    • lambda (Double) – Poisson rate parameter. Must be non-negative.
  • qchisqtail(p: Double, df: Double): Double

    Returns right-quantile x for which p = Prob(\(Z^2\) > x) with \(Z^2\) a chi-squared random variable with degrees of freedom specified by df. p must satisfy 0 < p <= 1. Inverse of pchisq1tail.

    Arguments

    • p (Double) – Probability
    • df (Double) – Degrees of freedom.
  • qnorm(p: Double): Double

    Returns left-quantile x for which p = Prob(\(Z\) < x) with \(Z\) a standard normal random variable. p must satisfy 0 < p < 1. Inverse of pnorm.

    Arguments

    • p (Double) – Probability
  • qpois(p: Double, lambda: Double, lowerTail: Boolean, logP: Boolean): Int

    If lowerTail equals true, returns the smallest integer \(x\) such that Prob(\(X \leq x\)) \(\geq\) p where \(X\) is a Poisson random variable with rate parameter lambda. If lowerTail equals false, returns the largest integer \(x\) such that Prob(\(X > x\)) \(\geq\) p. Inverts ppois.

    Arguments

    • p (Double) – Quantile to compute. Must satisfy \(0 \leq p \leq 1\).
    • lambda (Double) – Poisson rate parameter. Must be non-negative.
    • lowerTail (Boolean) – If false, returns the right-tail inverse cumulative density function.
    • logP (Boolean) – If true, input quantiles are given as log(p).
  • qpois(p: Double, lambda: Double): Int

    Returns the smallest integer \(x\) such that Prob(\(X \leq x\)) \(\geq\) p where \(X\) is a Poisson random variable with rate parameter lambda. Inverts ppois.

    Arguments

    • p (Double) – Quantile to compute. Must satisfy \(0 \leq p \leq 1\).
    • lambda (Double) – Poisson rate parameter. Must be non-negative.
  • range(start: Int, stop: Int, step: Int): Array[Int]

    Generate an Array with values in the interval [start, stop) in increments of step.

    let r = range(0, 5, 2) in r
    result: [0, 2, 4]
    

    Arguments

    • start (Int) – Starting number of the sequence.
    • stop (Int) – Generate numbers up to, but not including this number.
    • step (Int) – Difference between each number in the sequence.
  • range(start: Int, stop: Int): Array[Int]

    Generate an Array with values in the interval [start, stop).

    let r = range(5, 8) in r
    result: [5, 6, 7]
    

    Arguments

    • start (Int) – Starting number of the sequence.
    • stop (Int) – Generate numbers up to, but not including this number.
  • range(stop: Int): Array[Int]

    Generate an Array with values in the interval [0, stop).

    let r = range(3) in r
    result: [0, 1, 2]
    

    Arguments

    • stop (Int) – Number of integers (whole numbers) to generate, starting from zero.
  • rnorm(mean: Double, sd: Double): Double

    Returns a random draw from a normal distribution with mean mean and standard deviation sd. sd should be non-negative. This function is non-deterministic.

    Arguments

    • mean (Double) – Mean value of normal distribution.
    • sd (Double) – Standard deviation of normal distribution.
  • rpois(n: Int, lambda: Double): Array[Double]

    Returns n random draws from a Poisson distribution with rate parameter lambda. This function is non-deterministic.

    Arguments

    • n (Int) – Number of random draws to make.
    • lambda (Double) – Poisson rate parameter. Must be non-negative.
  • rpois(lambda: Double): Double

    Returns a random draw from a Poisson distribution with rate parameter lambda. This function is non-deterministic.

    Arguments

    • lambda (Double) – Poisson rate parameter. Must be non-negative.
  • runif(min: Double, max: Double): Double

    Returns a random draw from a uniform distribution on [min, max). min should be less than or equal to max. This function is non-deterministic.

    Arguments

    • min (Double) – Minimum value of interval.
    • max (Double) – Maximum value of interval, non-inclusive.
  • select(s: Struct, identifiers: String*): Struct

    Return a new Struct with a subset of fields.

    let s = {gene: "ACBD", function: "LOF", nHet: 12} in select(s, gene, function)
    result: {gene: "ACBD", function: "LOF"}
    

    Arguments

    • s (Struct) – Struct to select fields from.
    • identifiers (String*) – Field names to select from s. Multiple arguments allowed.
  • sqrt(x: Double): Double

    Returns the square root of the given value x.

    Arguments

    • x (Double) – the number to take the square root of.
  • str(x: T): String

    Returns the string representation of a data type.

    let v = Variant("1", 278653, "A", "T") in str(v)
    result: "1:278653:A:T"
    
  • Variant(contig: String, pos: Int, ref: String, alts: Array[String]): Variant

    Construct a Variant object.

    let v = Variant("1", 25782743, "A", Array("T", "TA")) in v.ref
    result: "A"
    

    Arguments

    • contig (String) – String representation of contig.
    • pos (Int) – SNP position or start of an indel.
    • ref (String) – Reference allele sequence.
    • alts (Array[String]) – Array of alternate allele sequences.
  • Variant(contig: String, pos: Int, ref: String, alt: String): Variant

    Construct a Variant object.

    let v = Variant("2", 13427, "T", "G") in v.ref
    result: "T"
    

    Arguments

    • contig (String) – String representation of contig.
    • pos (Int) – SNP position or start of an indel.
    • ref (String) – Reference allele sequence.
    • alt (String) – Alternate allele sequence.
  • Variant(s: String): Variant

    Construct a Variant object.

    let v = Variant("7:76324539:A:G") in v.contig
    result: "7"
    

    Arguments

    • s (String) – String of the form CHR:POS:REF:ALT or CHR:POS:REF:ALT1,ALT2...ALTN specifying the contig, position, reference and alternate alleles.