Results
NERRs eDNA
Metabarcoding analysis for NERRs sites sampled quarterly. Metadata for all sites here, including SWMP data when available. Downloaded from https://cdmo.baruch.sc.edu
OTU Abundance
Interactive barplots of relative abundance can be found here Non-interactive plots broken up by region can be found here
Core diversity metrics
PCoA is performed on distance matrices for the metrics below (seems to better handle missing data than PCA does). More complete descriptions here: https://docs.onecodex.com/en/articles/4150649-beta-diversity
- Alpha diversity
- Rarefaction
- Shannon’s diversity index (a quantitative measure of community richness)
- Faith’s Phylogenetic Diversity (a qualitative measure of community richness that incorporates phylogenetic relationships between the features)Kruskal-Wallis
- Observed OTUs (a qualitative measure of community richness) Kruskal-Wallis
- Evenness (or Pielou’s Evenness; a measure of community evenness)
- Beta diversity
- Jaccard distance (a qualitative measure of community dissimilarity. Qualitative - presence / absence - percentage of taxa not found in both samples) jaccard emperor
- Bray-Curtis distance (a quantitative measure of community dissimilarity. Takes into consideration abundance and presence absence) bray curtis emperor
- Unweighted UniFrac distance (a qualitative measure of community dissimilarity that incorporates phylogenetic relationships between the features. Percentage of phylogenetic branch length not found in both samples) unweighted unifrac emperor
- Weighted UniFrac distance (a quantitative measure of community dissimilarity that incorporates phylogenetic relationships between the features. Similar to Bray-Curtis but takes into consideration phylogenetic relationships) weighted unifrac emperor
Unifrac PCoA performed on Unweighted UniFrac distance matrix
Samples colored by minimum salinity from SWMP collected data within X days of eDNA sample collection. This is an interactive plot that can be found here
Phylogenetic RPCA and CTF.
“… robust principal-component analysis (RPCA) addresses sparsity and compositionality; compositional tensor factorization (CTF) addresses sparsity, compositionality, and repeated measure study designs; and UniFrac incorporates phylogenetic information. Here we introduce a strategy of incorporating phylogenetic information into RPCA and CTF. The resulting methods, phylo-RPCA, and phylo-CTF, provide substantial improvements over state-of-the-art methods in terms of discriminatory power of underlying clustering https://pmc.ncbi.nlm.nih.gov/articles/PMC9238373/
The qurro interactive plots are to explore the log fold change abundance of the features loading on the axis of each PCoA. The features can be plotted for groups of the samples (grouped by a meatadata column) or along a continous variable (eg: Salinity)
All Samples Phylogenetic rpca
The phylo empress viz provides a phylogenetic tree of the ASVs alongside the ordination plots for the samples phylo-empress
Explore the rpca biplot feature loadings of samples qurro-phylogenetic-rpca-with-taxonomy
Regional phylogenetic rpca
SE_qurro-phylogenetic-rpca-with-taxonomy
NE_qurro-phylogenetic-rpca-with-taxonomy
N-Pacific_qurro-phylogenetic-rpca-with-taxonomy
Pacific-Island_qurro-phylogenetic-rpca-with-taxonomy
Gemelli ctf
“In order to account for the correlation among samples from the same subject we will employ compositional tensor factorization (CTF). CTF builds on the ability to account for compositionality and sparsity using the robust center log-ratio transform … but restructures and factors the data as a tensor. Here we will run CTF through gemelli and explore/interpret the different results.”
Pacific-Island_qurro-ctf-qurro
Longitudinal Volatility
Interactive line plots assess how volatile a dependent variable (ASV or taxonomic group) is over a continuous, independent variable (e.g., time) in one or more groups. Select which ASV or taxa to plot on the y-axis to examine how variance in diversity and other metadata changes across time (set with the state-column parameter) in groups of samples and in individual subjects (set with the individual-id-column parameter).
longitudinal_volatility_family
ASV volitility
Regional ASV Volatility
Regional Genus Volatility
N-Pacific feat volitility genus
PacIsland feat volitility genus
Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.
Regional State Subject Longitudinal Volatility
N-Pacific_state_subject_ordination
PacIsland_state_subject_ordination
Accounts for the correlation among samples from the same subject (site within NERR) rf-state_subject_ordination
Here, points represent each site (subject) rather than each of the samples. (to look for groupings by salinity or other features) state-subject-ordination
beta-group-significance: Group samples by a metadata column to determine whether they are significantly different from one another using a permutation-based statistical test. At the national scale,
Longitudinal pairwise distance
The pairwise-distances visualizer also assesses changes between paired samples from two different “states”, but instead of taking a metadata column or artifact as input, it operates on a distance matrix to assess the distance between “pre” and “post” sample pairs, and tests whether these paired differences are significantly different between different groups, as specified by the group-column parameter. (Qiime doc) For our data, this will test whether the effect of season differs between regions. We expect northern climates to have a greater seasonal effect. Each comparison was perfomed using the unweighted unifrac distance matrix
These plots appears to support greater distances among norther samples over the 1 and 3rd quarter, compared to southern samples over that same timeframe North_South-1_3
This comparison was perfomed using the gemelli ctf distance matrix (not sure if this is appropriate)
Mixed effects models
linear-mixed-effects-by-region
linear-mixed-effects-by-salinity
Regional Analysis:
New England
Results:
Bar-plot images of sites broken up by regions here
Network plots of sites broken up by regions here
Picocyanobacteria
Gemelli
For sparse compositional omics datasets. All these methods are unsupervised and aim to describe sample/subject variation and the biological features that separate them.
- Robust Aitchison PCA (RPCA)
- Compositional Tensor Factorization (CTF) The preprocessing transform for both RPCA and CTF is the robust centered log-ratio transform (rlcr) which accounts for sparse data (i.e. many missing/zero values).