Results

NERRs eDNA

Metabarcoding analysis for NERRs sites sampled quarterly. Metadata for all sites here, including SWMP data when available. Downloaded from https://cdmo.baruch.sc.edu

OTU Abundance

Interactive barplots of relative abundance can be found here Non-interactive plots broken up by region can be found here gulf

Core diversity metrics

PCoA is performed on distance matrices for the metrics below (seems to better handle missing data than PCA does). More complete descriptions here: https://docs.onecodex.com/en/articles/4150649-beta-diversity

Unifrac PCoA performed on Unweighted UniFrac distance matrix

unifrac Samples colored by minimum salinity from SWMP collected data within X days of eDNA sample collection. This is an interactive plot that can be found here

Phylogenetic RPCA and CTF.

“… robust principal-component analysis (RPCA) addresses sparsity and compositionality; compositional tensor factorization (CTF) addresses sparsity, compositionality, and repeated measure study designs; and UniFrac incorporates phylogenetic information. Here we introduce a strategy of incorporating phylogenetic information into RPCA and CTF. The resulting methods, phylo-RPCA, and phylo-CTF, provide substantial improvements over state-of-the-art methods in terms of discriminatory power of underlying clustering https://pmc.ncbi.nlm.nih.gov/articles/PMC9238373/

The qurro interactive plots are to explore the log fold change abundance of the features loading on the axis of each PCoA. The features can be plotted for groups of the samples (grouped by a meatadata column) or along a continous variable (eg: Salinity)

All Samples Phylogenetic rpca

The phylo empress viz provides a phylogenetic tree of the ASVs alongside the ordination plots for the samples phylo-empress
Explore the rpca biplot feature loadings of samples qurro-phylogenetic-rpca-with-taxonomy

Regional phylogenetic rpca

SE_phylo-empress

SE_qurro-phylogenetic-rpca-with-taxonomy

NE_phylo-empress

NE_qurro-phylogenetic-rpca-with-taxonomy

N-Pacific_phylo-empress

N-Pacific_qurro-phylogenetic-rpca-with-taxonomy

Pacific-Island_phylo-empress

Pacific-Island_qurro-phylogenetic-rpca-with-taxonomy

Gemelli ctf

“In order to account for the correlation among samples from the same subject we will employ compositional tensor factorization (CTF). CTF builds on the ability to account for compositionality and sparsity using the robust center log-ratio transform … but restructures and factors the data as a tensor. Here we will run CTF through gemelli and explore/interpret the different results.”

NE_qurro-ctf-qurro

SE_qurro-ctf-qurro

N-Pacific_qurro-ctf-qurro

Pacific-Island_qurro-ctf-qurro

NO-island_qurro-ctf-qurro

Longitudinal Volatility

Interactive line plots assess how volatile a dependent variable (ASV or taxonomic group) is over a continuous, independent variable (e.g., time) in one or more groups. Select which ASV or taxa to plot on the y-axis to examine how variance in diversity and other metadata changes across time (set with the state-column parameter) in groups of samples and in individual subjects (set with the individual-id-column parameter).

longitudinal_volatility_ASVs

longitudinal_volatility_genus

longitudinal_volatility_family

ASV volitility

Sal-Min-volatility

Regional ASV Volatility

SE_feat-volatility

N-Pacific_feat-volatility

PacIsland_feat-volatility

NE_feat-volatility

Regional Genus Volatility

SE feat volitility genus

N-Pacific feat volitility genus

PacIsland feat volitility genus

NE feat volitility genus

Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.

Regional State Subject Longitudinal Volatility

SE_state_subject_ordination

N-Pacific_state_subject_ordination

PacIsland_state_subject_ordination

NE_state_subject_ordination

genus heatmap

heatmap

heatmap30

Accounts for the correlation among samples from the same subject (site within NERR) rf-state_subject_ordination

Here, points represent each site (subject) rather than each of the samples. (to look for groupings by salinity or other features) state-subject-ordination

beta-group-significance: Group samples by a metadata column to determine whether they are significantly different from one another using a permutation-based statistical test. At the national scale,

beta-permanova-salinity

beta-permanova-region

beta-permanova-NERR

beta-permanova-Quarter

beta-permanova-Site

Longitudinal pairwise distance

The pairwise-distances visualizer also assesses changes between paired samples from two different “states”, but instead of taking a metadata column or artifact as input, it operates on a distance matrix to assess the distance between “pre” and “post” sample pairs, and tests whether these paired differences are significantly different between different groups, as specified by the group-column parameter. (Qiime doc) For our data, this will test whether the effect of season differs between regions. We expect northern climates to have a greater seasonal effect. Each comparison was perfomed using the unweighted unifrac distance matrix

Region_1-3

Region_2-4

These plots appears to support greater distances among norther samples over the 1 and 3rd quarter, compared to southern samples over that same timeframe North_South-1_3

North_South-2_4

This comparison was perfomed using the gemelli ctf distance matrix (not sure if this is appropriate)

North_South-2_4-gemelli

Mixed effects models

linear-mixed-effects-by-region

linear-mixed-effects-by-salinity

phylo-salinity_significance

Regional Analysis:

New England

NE_subject_biplot

state_subject_ordination

accuracy_results

Results:

Bar-plot images of sites broken up by regions here

Network plots of sites broken up by regions here

Picocyanobacteria

Bar plot

Gemelli

For sparse compositional omics datasets. All these methods are unsupervised and aim to describe sample/subject variation and the biological features that separate them.