Diversity of fish and vulnerable species in Marine World Heritage Sites based on OBIS data

This notebook explores species diversity in Marine World Heritage Sites using OBIS data. This is work in progress.

How to use this notebook

Optionally remove the following cache files before running the notebook:

  • data/occurrence.Rdata: occurrences by site
  • data/bold.Rdata: statistics on barcode sequences in BOLD for all fish species

If necessary, adjust fish and coral taxonomy below:

fish_classes <- c("Actinopteri", "Cephalaspidomorphi", "Myxini", "Petromyzonti", "Elasmobranchii", "Holocephali", "Coelacanthi", "Chondrostei", "Teleostei")
fish_orders <- c("Cetomimiformes", "Gasterosteiformes", "Scorpaeniformes", "Stephanoberyciformes")
coral_classes <- c("Anthozoa")
not_coral_orders <- c("Penicillaria", "Spirularia", "Actiniaria")

Other resources

A separate repository has been set up for compiling species list based on existing publications, see https://github.com/iobis/mwhs-species-lists.

Dependencies

library(dplyr)
library(caspr)
library(rredlist)
library(knitr)
library(ggplot2)
library(sf)
library(mapview)
library(concaveman)
library(sfheaders)
library(purrr)
library(robis)
library(stringr)
library(ftplottools)
library(ggrepel)

Compiling a list of all WoRMS species including IUCN Red List category and BOLD barcode statistics

First read all accepted WoRMS species. As it’s not possible to get a full species list from the WoRMS or GBIF web services, I’m reading directly from an export provided by the WoRMS team. The list of fish classes and orders above is used to determine if species are fish or not.

worms <- read.csv("data/taxon.txt", sep = "\t", quote = "") %>%
  as_tibble() %>%
  filter(taxonRank == "Species" & taxonomicStatus == "accepted") %>%
  select(taxonID, scientificName, kingdom, phylum, class, order, family, genus) %>%
  distinct() %>%
  mutate(is_fish = class %in% fish_classes | order %in% fish_orders) %>%
  mutate(is_coral = class %in% coral_classes & !(order %in% not_coral_orders)) %>%
  mutate(aphiaID = as.integer(str_replace(taxonID, "urn:lsid:marinespecies.org:taxname:", ""))) %>%
  mutate_all(~na_if(., ""))

Assign Red List categories

Here I’m using the rredlist package to get all Red List species from IUCN. Only the extinct and threatened categories are kept: CR, EN, VU, EW, EX.

get_redlist_species <- function() {
  redlist <- tibble()
  page <- 0
  while (TRUE) {
    res <- rl_sp(page, key = "a936c4f78881e79a326e73c4f97f34a6e7d8f9f9e84342bff73c3ceda14992b9")$result
    if (length(res) == 0) {
      break
    }
    redlist <- bind_rows(redlist, res)
    page <- page + 1
  }
  redlist <- redlist %>%
    as_tibble() %>%
    filter(category %in% c("CR", "EN", "VU", "EW", "EX")) %>%
    mutate(category = factor(category, levels = c("EX", "EW", "CR", "EN", "VU"))) %>%
    group_by(scientific_name) %>%
    filter(row_number() == 1) %>%
    ungroup()
  return(redlist)  
}

redlist <- get_redlist_species()

Now we can label Red List species in the worms data frame.

worms <- worms %>%
  left_join(redlist %>% select(scientific_name, category), by = c("scientificName" = "scientific_name"))

Export WoRMS list

openxlsx::write.xlsx(worms, "output/worms_species.xlsx", rowNames = FALSE)
write.csv(worms, "output/worms_species.csv", row.names = FALSE)

Get BOLD barcode statistics for all fish species (SLOW)

Let’s check BOLD for barcode sequences using all fish species in WoRMS:

# fixes issue with bold package
# invisible(Sys.setlocale("LC_ALL", "C"))

if (!file.exists("data/bold.Rdata")) {
  fish_species <- worms %>% filter(is_fish) %>% pull(scientificName) %>% unique()
  bold_list <- sapply(fish_species, function(x) NULL)
  for (i in 1:length(bold_list)) {
    message(i, " ", names(bold_list)[i])
    if (is.null(bold_list[[i]])) {
      bold_list[[i]] <- tryCatch({
        caspr::bold_statistics(names(bold_list)[i])
      }, warning = function(warning_condition) {
      }, error = function(error_condition) {
      }, finally = {
      })
    }
  }
  save(bold_list, file = "data/bold.Rdata")
} else {
  load("data/bold.Rdata")
}

sequence_numbers <- unlist(map(bold_list, nrow))
fish_sequences <- tibble(species = names(sequence_numbers), sequences = sequence_numbers) %>%
  filter(sequences > 0)

worms <- worms %>%
  left_join(fish_sequences, by = c("scientificName" = "species"))

Statistics

Now calculate some statistics:

worms %>%
  summarize(
    species = n(),
    fish = sum(is_fish),
    vulnerable = length(na.omit(category)),
    vulnerable_fish = sum(is_fish * !is.na(category)),
    barcode_fish = sum(sequences > 0, na.rm = T)
  ) %>%
  kable(format.args = list(big.mark = ","))
species fish vulnerable vulnerable_fish barcode_fish
224,570 19,918 1,565 879 751

Note that these numbers are slightly inflated due to homonyms, even after removing unaccepted names.

ggplot(worms %>% filter(!is.na(category))) +
  geom_bar(aes(x = phylum, fill = category)) +
  coord_flip() +
  scale_fill_viridis_d(direction = -1)

Marine World Heritage Sites statistics

In this section we will look at the diversity of fish and vulnerable species in each marine World Heritage site.

Fetch spatial data

Spatial features for the marine World Heritage sites have been prepared in another repository, see https://github.com/iobis/mwhs-shapes.

if (!file.exists("data/marine_world_heritage.gpkg")) {
  download.file("https://github.com/iobis/mwhs-shapes/blob/master/output/marine_world_heritage.gpkg?raw=true", "data/marine_world_heritage.gpkg")
}

shapes <- st_read("data/marine_world_heritage.gpkg")
## Reading layer `marine_world_heritage' from data source 
##   `/Users/pieter/IPOfI Dropbox/Pieter Provoost/werk/projects/eDNA/notebook-mwhs/data/marine_world_heritage.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 60 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -55.32282 xmax: 180 ymax: 71.81381
## Geodetic CRS:  WGS 84

Process spatial data

For some sites, the GeoPackage has core as well as buffer areas. Merge the geometries by site.

shapes_processed <- shapes %>%
  group_by(name) %>%
  summarize()

Fetch occurrence data

Now retrieve data from OBIS by area. OBIS is queried by bounding box, but a point-in-polygon calculation is used to discard points outside the areas.

occ_for_geom <- function(geom) {
  wkt <- st_as_text(st_as_sfc(st_bbox(geom)), digits = 6)
  message(wkt)
  occ <- occurrence(geometry = wkt, fields = c("decimalLongitude", "decimalLatitude", "date_year", "scientificName", "aphiaID", "dataset_id")) %>%
    st_as_sf(coords = c("decimalLongitude", "decimalLatitude"), crs = 4326)
  occ_filtered <- occ %>%
    filter(st_intersects(geometry, geom, sparse = FALSE)) %>%
    as_tibble() %>%
    select(-geometry)
  return(occ_filtered)
}

sf_use_s2(FALSE)

if (!file.exists("data/occurrence.Rdata")) {
  occs <- map(shapes_processed$geom, occ_for_geom)
  for (i in 1:nrow(shapes_processed)) {
    occs[[i]]$name <- shapes_processed$name[i]
  }
  occ <- bind_rows(occs)
  save(occ, file = "data/occurrence.Rdata")
} else {
  load("data/occurrence.Rdata")
}

occ <- occ %>%
  inner_join(worms %>% select(phylum, class, order, family, genus, is_fish, is_coral, aphiaID, category, sequences), by = "aphiaID") %>%
  mutate(
    aphia_fish = ifelse(is_fish == TRUE, aphiaID, NA),
    aphia_vulnerable = ifelse(!is.na(category), aphiaID, NA),
    aphia_vulnerable_fish = ifelse(is_fish == TRUE & !is.na(category), aphiaID, NA),
    aphia_barcode_fish = ifelse(is_fish == TRUE & sequences > 0, aphiaID, NA),
  ) 

Calculate statistics

occ %>%
  summarize(
    records = n(),
    species = length(unique(aphiaID)),
    fish = length(unique(aphia_fish)),
    vulnerable = length(unique(aphia_vulnerable)),
    vulnerable_fish = length(unique(aphia_vulnerable_fish)),
    barcode_fish =  length(unique(aphia_barcode_fish))
  ) %>%
  kable(format.args = list(big.mark = ","))
records species fish vulnerable vulnerable_fish barcode_fish
3,588,181 28,258 6,700 550 298 271
site_stats <- occ %>%
  group_by(name) %>%
  summarize(
    records = n(),
    species = length(unique(aphiaID)),
    fish = length(unique(aphia_fish)),
    vulnerable = length(unique(aphia_vulnerable)),
    vulnerable_fish = length(unique(aphia_vulnerable_fish)),
    barcode_fish =  length(unique(aphia_barcode_fish))
  )

xlsx::write.xlsx(site_stats, file = "output/sites.xlsx")

site_stats %>%
  kable(format.args = list(big.mark = ","))
name records species fish vulnerable vulnerable_fish barcode_fish
Aldabra Atoll 2,946 1,038 498 81 9 7
Archipiélago de Revillagigedo 9,021 469 376 62 49 25
Area de Conservación Guanacaste 1,135 359 345 15 12 8
Banc d’Arguin National Park 1,224 44 23 5 4 3
Belize Barrier Reef Reserve System 8,790 1,082 345 50 28 11
Brazilian Atlantic Islands: Fernando de Noronha and Atol das Rocas Reserves 25,893 542 165 15 11 11
Cocos Island National Park 2,429 531 408 55 50 29
Coiba National Park and its Special Zone of Marine Protection 2,522 593 439 30 26 16
East Rennell 27 9 6 1 1 1
Everglades National Park 60,414 935 285 30 19 10
French Austral Lands and Seas 165,351 1,458 277 25 6 18
Galápagos Islands 60,148 1,628 729 89 67 36
Gough and Inaccessible Islands 2,497 198 46 12 5 6
Great Barrier Reef 1,309,086 12,915 2,834 238 93 127
Gulf of Porto: Calanche of Piana, Gulf of Girolata, Scandola Reserve 113 21 2 5 1 1
Ha Long Bay 1,967 326 318 8 8 4
Heard and McDonald Islands 38,615 257 29 10 1 4
High Coast / Kvarken Archipelago 16,510 128 19 2 2 1
Ibiza, Biodiversity and Culture 72 35 14 2 1 2
iSimangaliso Wetland Park 25,036 2,294 1,100 88 67 57
Islands and Protected Areas of the Gulf of California 17,184 1,308 772 76 56 55
Kluane / Wrangell-St Elias / Glacier Bay / Tatshenshini-Alsek 965 122 91 8 2 5
Komodo National Park 682 363 217 12 10 8
Lagoons of New Caledonia: Reef Diversity and Associated Ecosystems 30,368 4,539 871 44 16 13
Lord Howe Island Group 39,412 1,774 484 45 15 18
Macquarie Island 89,036 652 98 14 1 2
Malpelo Fauna and Flora Sanctuary 8,421 395 269 50 38 24
Natural System of Wrangel Island Reserve 295 63 11 7 1 1
New Zealand Sub-Antarctic Islands 10,681 1,283 78 28 8 16
Ningaloo Coast 346,971 3,195 927 104 52 64
Ogasawara Islands 444 239 120 19 5 4
Papahānaumokuākea 271,733 1,739 586 43 18 15
Pennsula Valds 4,479 215 6 8 1 1
Phoenix Islands Protected Area 3,337 670 362 10 6 3
Puerto Princesa Subterranean River National Park 1 1 1 1 1 1
Rock Islands Southern Lagoon 22,417 2,232 1,062 53 16 7
Sanganeb Marine National Park and Dungonab Bay - Mukkawar Island Marine National Park 643 150 9 9 2 2
Shark Bay, Western Australia 13,290 1,347 538 49 17 32
Shiretoko 291 59 57 2 1 1
Sian Ka’an 2,513 226 110 30 9 11
Socotra Archipelago 4,325 595 119 21 2 1
St Kilda 7,674 394 31 9 3 3
Surtsey 7 3 1 3 1 1
The Wadden Sea 977,943 749 101 18 12 10
Tubbataha Reefs Natural Park 209 63 9 9 5 5
Ujung Kulon National Park 3 2 1 2 1 1
West Norwegian Fjords – Geirangerfjord and Nærøyfjord 17 8 1 2 1 1
Whale Sanctuary of El Vizcaino 1,044 168 124 12 11 18
subset <- site_stats %>%
  filter(name %in% c("Great Barrier Reef", "Belize Barrier Reef Reserve System", "Ha Long Bay", "The Wadden Sea", "Heard and McDonald Islands", "Aldabra Atoll")) %>%
  arrange(name)

subset$x <- c(600, 100, 800, 500, 10, 60)
subset$y <- c(140, 80, 210, 3, 20, 40)

ggplot() +
  geom_segment(subset, mapping = aes(x = x, y = y, xend = fish, yend = vulnerable)) +
  geom_label(subset, mapping = aes(x = x, y = y, label = name), size = 3.5, label.size = 0) +
  geom_point(site_stats, mapping = aes(x = fish, y = vulnerable, size = species, color = species), shape = 21, stroke = 1.2, fill = "white") +
  scale_x_continuous(trans = "log10") +
  scale_y_continuous(trans = "log10") +
  scale_size(range = c(1, 8), breaks = c(1000, 5000, 10000)) +
  scale_color_viridis_c(end = 0.8, trans = "log10", breaks = c(1000, 5000, 10000)) +
  guides(color = guide_legend(), size = guide_legend()) +
  xlab("fish species") + ylab("vulnerable species") +
  ft_theme() +
  ggtitle("Fish and vulnerable species diversity at marine World Heritate sites")

ggsave("output/sites.png", width = 10, height = 6, dpi = 600)

Species lists by site

species_lists <- occ %>%
  group_by(name, scientificName, aphiaID, phylum, class, order, family, genus, is_fish, is_coral, category, sequences) %>%
  summarize() %>%
  select(site = name, scientificName, aphiaID, phylum, class, order, family, genus, is_fish, is_coral, iucn_category = category, bold_sequences = sequences) %>%
  arrange(site, phylum, class, order, family, scientificName) %>%
  ungroup()

write.csv(species_lists, file = "output/species_lists.csv", row.names = FALSE, na = "")
xlsx::write.xlsx(as.data.frame(species_lists), file = "output/species_lists.xlsx", showNA = FALSE, row.names = FALSE)
xlsx::write.xlsx(as.data.frame(species_lists %>% filter(is_fish)), file = "output/species_lists_fish.xlsx", showNA = FALSE, row.names = FALSE)