Doubleton sharing¶
Doubleton sharing, a.k.a., analysis of f2 variants.
See also the examples at:
Count subpopulation pairs sharing doubletons (where one allele is observed in each subpopulation).
Parameters: subpops_ac : array_like, int
An array of shape (n_variants, n_subpops) holding alternate allele counts for each subpopulation.
Returns: counts : ndarray, int or float
A square matrix of shape (n_subpops, n_subpops) where the array element at index (i, j) holds the count of shared doubletons between the ith and jth subpopulations.
-
anhima.f2.
normalise_doubleton_counts
(counts, n_samples, ploidy=2)[source]¶ Normalise doubleton counts by dividing by the number of distinct pairs of haplotypes in each population comparison.
Parameters: counts : array_like, ints
A square matrix of shape (n_subpops, n_subpops) where the array element at index (i, j) holds the count of shared doubletons between the ith and jth subpopulations.
n_samples : int or sequence of ints
The number of samples in each sub-population.
ploidy : int, optional
The sample ploidy.
Returns: normed_counts : ndarray, float
Normalised counts of shared doubletons.
See also
Notes
This function corrects for the fact that there are fewer pairs of haplotypes when looking for doubletons within a single subpopulation of size n than there are when comparing two different subpopulations of size n.
This function may also help to correct for the case where the number of samples from each subpopulation is not equal. However, note that if this is the case then there may still also be some bias in how doubletons have been ascertained.
Plot counts of doubleton sharing between subpopulations as a bar chart.
Parameters: counts : array_like, ints
A square matrix of shape (n_subpops, n_subpops) where the array element at index (i, j) holds the count of shared doubletons between the ith and jth subpopulations.
subpop_labels : sequence of strings, optional
Labels for the subpopulations.
subpop_colors : sequence of colors, optional
Colors for the subpopulations.
axs : sequence of axes, optional
The axes to use. If not provided, a new figure will be created.
figsize_factor : float, optional
Figure size in inches per subpopulation. Only used if axs is None.
ylim : pair of ints or floats, optional
Limits for the Y axes of all subplots.
relative : bool, optional
If True, normalise counts by dividing by the sum along each row.
flip : bool, optional
If True, invert the Y axis.
Returns: axs : sequence of axes
The axes on which the plot was drawn.
-
anhima.f2.
plot_total_doubletons
(counts, subpop_labels=None, width=0.8, orientation='vertical', n_samples=None, ax=None, bar_kwargs=None)[source]¶ Plot total counts of doubletons per subpopulations as a bar chart.
Parameters: counts : array_like, ints
A square matrix of shape (n_subpops, n_subpops) where the array element at index (i, j) holds the count of shared doubletons between the ith and jth subpopulations.
subpop_labels : sequence of strings, optional
Labels for the subpopulations.
width : float, optional
The relative width of each bar.
orientation : {‘vertical’, ‘horizontal’}
The bar orientation.
n_samples : int or sequence of ints
The number of samples in each sub-population.
ax : axes, optional
The axes on which to plot. If not provided, a new figure will be created.
bar_kwargs : dict, optional
Keyword arguments passed through to ax.bar().
Returns: ax : axes
The axes on which the plot was drawn.
-
anhima.f2.
plot_f2_fig
(counts, subpop_labels=None, subpop_colors='bgrcmyk', fig=None, figsize_factor=1, relative=False, normed=False, n_samples=None, ploidy=2)[source]¶ Plot a combined figure of shared doubleton counts and total counts per subpopulation.
Parameters: counts : array_like, ints
A square matrix of shape (n_subpops, n_subpops) where the array element at index (i, j) holds the count of shared doubletons between the ith and jth subpopulations.
subpop_labels : sequence of strings, optional
Labels for the subpopulations.
subpop_colors : sequence of colors, optional
Colors for the subpopulations.
fig : figure, optional
The figure to use. If not provided, a new figure will be created.
figsize_factor : float, optional
Figure size in inches per subpopulation. Only used if fig is None.
relative : bool, optional
If True, plot counts relative to the sum along each row.
normed : bool, optional
If True, normalise counts by dividing by the number of possible pairs of haplotypes.
n_samples : int or sequence of ints
The number of samples in each sub-population.
ploidy : int, optional
The sample ploidy. (Only relevant if normed is True.)
Returns: fig : figure
The figure on which the plot was drawn.