Data Filtering

Home » Data Filtering

For each selected location/groups of locations, this calculation allows to exclude from the dataset rare taxa based on their rank distribution in density (n of cells per liter), biovolume per liter (μm3 of cells per liter) or carbon content per liter (pgC per liter). 

Input values are:

  • Dataset (LifeWatch Dataset): dataset with individual or aggregated records, Lifewatch format. The workflow runs only with data resources structured according to the LifeWatch Data Schema (Phytoplankton_LW_Data_template). You can select a dataset from the list or upload your file.

  • Cluster: the spatial and temporal level at which the size distribution is tested. The aggregation could be done at spatial (i.e. ‘parenteventid’, ‘eventid’, ‘locality’, ‘country’) and/or temporal (i.e. ‘day’, ‘month’, ‘year’) levels. 

  • Taxonomic levels : the taxonomic level at which the selection is made.

  • Size unit: The measure to be considered for the ranked distribution, one amongdensity” (n of cells per liter), “biovolumeliter” (μm3 of cells per liter) or “carboncontentliter” (pgC per liter). Default is “density”.

  • Threshold (between 0 and 1): threshold in the ranked distribution of taxa density/biovolume per liter/carbon content per liter above which taxa are retained in the dataset. (default value 1).

OUTPUT: a zip file which contains the original dataset excluding the taxa constituting less than the designated threshold of cumulative density/biovolume/biomass for each cluster of observations. 

Select the input values and start the execution with the “Run the workflow” button to calculate the selection.


Select All