Vai al contenuto

Data Filtering

Home » Data Filtering

The data filtering web service is aimed at excluding from the data files rare taxa based on their ranked distribution in Density, Total Biovolume or Total Carbon Content. The mandatory fields for this web service are Density or Total Biovolume or Total Carbon Content.
If these mandatory fields are not present in the dataset, they can be obtained by using the dedicated Traits Computation web service.

The data filtering can be performed using different combinations of spatial, temporal and taxonomic levels by choosing a threshold value (between 0 and 1) in cumulative contribution to the overall Density, Total Biovolume or Total Carbon Content.
For example, if the data filtering is done for the trait “Density”, at the finest level of taxonomic classification, i.e. “scientificname”, by setting the threshold on “0.75”, all taxa that account for less than 0.25 in cumulative contribution to the overall Density of individuals in the sample will be removed.

The web service will return an output file in .csv format excluding the filtered data.

Please, follow the operational steps to run the service:

Input file must be in .csv format.

Please, specify all required values to run the service:
  • dataset
  • trait
  • threshold value
  • taxonomic level

The threshold value must be included between 0 and 1.

The threshold value must be numeric.

The input dataset must be a .csv file.

Step 1 – upload your dataset (*)

Upload your dataset structured according to the Phytoplankton Data Template.

Step 2 – select the trait (*)

You can select the trait to be used for the ranked distribution:





Step 3 – set the threshold value (*)

You can type in the threshold value (from 0 to 1) for the data filtering.


Step 4 – select the taxonomic level (*)

You can select the taxonomic level to be used for the data filtering.







Step 5 – select the spatial and temporal levels

These levels are needed for the clusterization of data. If no spatial or temporal levels are selected, the distribution is made on the whole dataset.