This document describes the minimum information required to explain the AtlantECO datasets from different data SILO. Please provide the information specified in blue. Keep the black text as it is. This document should be attached to long-format tables for modeling. Please name it as: [README] AtlantECO_WP2_AmpSeq_SILO_AbundanceSampleTaxonomyTables Any doubts? Please contact Paula Huber or Clara Arboleda _______________________________________________________________ TITLE: AtlantECO [WP2] - AmpSeq 18S LongTable dataset - 202212 ______________________________________________________________ 1.- INTRODUCTION This table contains the abundance and taxonomy of Eukaryotic ASVs information of 1405 samples from the Malaspina and Tara Oceans oceanographic expeditions. 2.- METHODOLOGY USED The metabarcoding dataset consists of an integration of quality-controlled sequencing data from the Tara Ocean and Malaspina expeditions. The primers used for the V9 region of the 18S rRNA gene were 1389F: 5'-TTGTACACACCGCCC-3' and 1510R: 5'-CCTTCYGCAGGTTCACCTAC-3' (Amaral-Zettler, et al., 2009). Amplicon sequences were processed using the DADA2 pipeline (Callahan et al., 2016; Lee, 2019) to characterize Amplicon Sequence Variants (ASVs) that were used as a proxy of microbial species (Callahan et al., 2017). Each sequencing project was analyzed separately because different runs can have different error profiles following Callahan et al. (2016). The quality of the samples was explored, and the trimming and filtering parameters were chosen according to Callahan et al. (2016). After merging the runs, the taxonomic classification was performed using the IDTAXA algorithm implemented in the DECIPHER package for the R programming language (Murali et al., 2018) and the PR2 database (PR2 v4.13) for18S rRNA gene primers as a reference (Guillou et al., 2012). Only samples with more than 10,000 reads were analyzed, and we kept ASVs with 50 reads distributed in at least three samples or those that have less than 50 reads distributed in more than three samples. 3.- DATASET DESCRIPTION Data type: Relative abundance data Latitude/Longitude format: Decimal degrees (DD) Geographic area covered by the dataset: Global Ocean Depth range covered by the dataset: Min 3 m , Max 4000 m Time period covered by the dataset: 15-09-2009 and 27-10-2013 Dataset format: csv (comma-separated values) Date of dataset creation: 23-02-2023 Raw dataset repository: ENA (European Nucleotide Archive) and MARBITS (Marine Bioinformatics Platform at ICM-CSIC) 4.- MAIN VARIABLE DESCRIPTION MeasurementTypeID: ASV ID MeasurementValue: Number of reads MeasurementID: ASV 5.- LINKS Links to the document with variables description for data submission to AtlantECO base: 1. AtlantECO-Base-v1_microbiome_genomic_AmpSeq18S_LongTable_UFSCar_202302.csv: https://drive.google.com/file/d/1-myhKRywwsHaG5IN9XmRdaZ8KbXoyiTZ/view?usp=share_link 6.- CONTRIBUTORS Hugo Sarmento / UFSCar / Brazil / hugo.sarmento@gmail.com Clara Arboleda-Baena / UFSCar / Brazil / claraarboledab@gmail.com