Diversity of fish and vulnerable species in Marine World Heritage Sites based on OBIS data
This notebook explores species diversity in Marine World Heritage Sites using OBIS data. This is work in progress.
How to use this notebook
Optionally remove the following cache files before running the notebook:
data/occurrence.Rdata
: occurrences by sitedata/bold.Rdata
: statistics on barcode sequences in BOLD for all fish species
If necessary, adjust fish and coral taxonomy below:
fish_classes <- c("Actinopteri", "Cephalaspidomorphi", "Myxini", "Petromyzonti", "Elasmobranchii", "Holocephali", "Coelacanthi", "Chondrostei", "Teleostei")
fish_orders <- c("Cetomimiformes", "Gasterosteiformes", "Scorpaeniformes", "Stephanoberyciformes")
coral_classes <- c("Anthozoa")
not_coral_orders <- c("Penicillaria", "Spirularia", "Actiniaria")
Other resources
A separate repository has been set up for compiling species list based on existing publications, see https://github.com/iobis/mwhs-species-lists.
Dependencies
Compiling a list of all WoRMS species including IUCN Red List category and BOLD barcode statistics
First read all accepted WoRMS species. As it’s not possible to get a full species list from the WoRMS or GBIF web services, I’m reading directly from an export provided by the WoRMS team. The list of fish classes and orders above is used to determine if species are fish or not.
worms <- read.csv("data/taxon.txt", sep = "\t", quote = "") %>%
as_tibble() %>%
filter(taxonRank == "Species" & taxonomicStatus == "accepted") %>%
select(taxonID, scientificName, kingdom, phylum, class, order, family, genus) %>%
distinct() %>%
mutate(is_fish = class %in% fish_classes | order %in% fish_orders) %>%
mutate(is_coral = class %in% coral_classes & !(order %in% not_coral_orders)) %>%
mutate(aphiaID = as.integer(str_replace(taxonID, "urn:lsid:marinespecies.org:taxname:", ""))) %>%
mutate_all(~na_if(., ""))
Assign Red List categories
Here I’m using the rredlist
package to get all Red List
species from IUCN. Only the extinct and threatened categories are kept:
CR
, EN
, VU
, EW
,
EX
.
get_redlist_species <- function() {
redlist <- tibble()
page <- 0
while (TRUE) {
res <- rl_sp(page, key = "a936c4f78881e79a326e73c4f97f34a6e7d8f9f9e84342bff73c3ceda14992b9")$result
if (length(res) == 0) {
break
}
redlist <- bind_rows(redlist, res)
page <- page + 1
}
redlist <- redlist %>%
as_tibble() %>%
filter(category %in% c("CR", "EN", "VU", "EW", "EX")) %>%
mutate(category = factor(category, levels = c("EX", "EW", "CR", "EN", "VU"))) %>%
group_by(scientific_name) %>%
filter(row_number() == 1) %>%
ungroup()
return(redlist)
}
redlist <- get_redlist_species()
Now we can label Red List species in the worms
data
frame.
Export WoRMS list
Get BOLD barcode statistics for all fish species (SLOW)
Let’s check BOLD for barcode sequences using all fish species in WoRMS:
# fixes issue with bold package
# invisible(Sys.setlocale("LC_ALL", "C"))
if (!file.exists("data/bold.Rdata")) {
fish_species <- worms %>% filter(is_fish) %>% pull(scientificName) %>% unique()
bold_list <- sapply(fish_species, function(x) NULL)
for (i in 1:length(bold_list)) {
message(i, " ", names(bold_list)[i])
if (is.null(bold_list[[i]])) {
bold_list[[i]] <- tryCatch({
caspr::bold_statistics(names(bold_list)[i])
}, warning = function(warning_condition) {
}, error = function(error_condition) {
}, finally = {
})
}
}
save(bold_list, file = "data/bold.Rdata")
} else {
load("data/bold.Rdata")
}
sequence_numbers <- unlist(map(bold_list, nrow))
fish_sequences <- tibble(species = names(sequence_numbers), sequences = sequence_numbers) %>%
filter(sequences > 0)
worms <- worms %>%
left_join(fish_sequences, by = c("scientificName" = "species"))
Statistics
Now calculate some statistics:
worms %>%
summarize(
species = n(),
fish = sum(is_fish),
vulnerable = length(na.omit(category)),
vulnerable_fish = sum(is_fish * !is.na(category)),
barcode_fish = sum(sequences > 0, na.rm = T)
) %>%
kable(format.args = list(big.mark = ","))
species | fish | vulnerable | vulnerable_fish | barcode_fish |
---|---|---|---|---|
224,570 | 19,918 | 1,565 | 879 | 751 |
Note that these numbers are slightly inflated due to homonyms, even after removing unaccepted names.
Marine World Heritage Sites statistics
In this section we will look at the diversity of fish and vulnerable species in each marine World Heritage site.
Fetch spatial data
Spatial features for the marine World Heritage sites have been prepared in another repository, see https://github.com/iobis/mwhs-shapes.
if (!file.exists("data/marine_world_heritage.gpkg")) {
download.file("https://github.com/iobis/mwhs-shapes/blob/master/output/marine_world_heritage.gpkg?raw=true", "data/marine_world_heritage.gpkg")
}
shapes <- st_read("data/marine_world_heritage.gpkg")
## Reading layer `marine_world_heritage' from data source
## `/Users/pieter/IPOfI Dropbox/Pieter Provoost/werk/projects/eDNA/notebook-mwhs/data/marine_world_heritage.gpkg'
## using driver `GPKG'
## Simple feature collection with 60 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -55.32282 xmax: 180 ymax: 71.81381
## Geodetic CRS: WGS 84
Process spatial data
For some sites, the GeoPackage has core as well as buffer areas. Merge the geometries by site.
Fetch occurrence data
Now retrieve data from OBIS by area. OBIS is queried by bounding box, but a point-in-polygon calculation is used to discard points outside the areas.
occ_for_geom <- function(geom) {
wkt <- st_as_text(st_as_sfc(st_bbox(geom)), digits = 6)
message(wkt)
occ <- occurrence(geometry = wkt, fields = c("decimalLongitude", "decimalLatitude", "date_year", "scientificName", "aphiaID", "dataset_id")) %>%
st_as_sf(coords = c("decimalLongitude", "decimalLatitude"), crs = 4326)
occ_filtered <- occ %>%
filter(st_intersects(geometry, geom, sparse = FALSE)) %>%
as_tibble() %>%
select(-geometry)
return(occ_filtered)
}
sf_use_s2(FALSE)
if (!file.exists("data/occurrence.Rdata")) {
occs <- map(shapes_processed$geom, occ_for_geom)
for (i in 1:nrow(shapes_processed)) {
occs[[i]]$name <- shapes_processed$name[i]
}
occ <- bind_rows(occs)
save(occ, file = "data/occurrence.Rdata")
} else {
load("data/occurrence.Rdata")
}
occ <- occ %>%
inner_join(worms %>% select(phylum, class, order, family, genus, is_fish, is_coral, aphiaID, category, sequences), by = "aphiaID") %>%
mutate(
aphia_fish = ifelse(is_fish == TRUE, aphiaID, NA),
aphia_vulnerable = ifelse(!is.na(category), aphiaID, NA),
aphia_vulnerable_fish = ifelse(is_fish == TRUE & !is.na(category), aphiaID, NA),
aphia_barcode_fish = ifelse(is_fish == TRUE & sequences > 0, aphiaID, NA),
)
Calculate statistics
occ %>%
summarize(
records = n(),
species = length(unique(aphiaID)),
fish = length(unique(aphia_fish)),
vulnerable = length(unique(aphia_vulnerable)),
vulnerable_fish = length(unique(aphia_vulnerable_fish)),
barcode_fish = length(unique(aphia_barcode_fish))
) %>%
kable(format.args = list(big.mark = ","))
records | species | fish | vulnerable | vulnerable_fish | barcode_fish |
---|---|---|---|---|---|
3,588,181 | 28,258 | 6,700 | 550 | 298 | 271 |
site_stats <- occ %>%
group_by(name) %>%
summarize(
records = n(),
species = length(unique(aphiaID)),
fish = length(unique(aphia_fish)),
vulnerable = length(unique(aphia_vulnerable)),
vulnerable_fish = length(unique(aphia_vulnerable_fish)),
barcode_fish = length(unique(aphia_barcode_fish))
)
xlsx::write.xlsx(site_stats, file = "output/sites.xlsx")
site_stats %>%
kable(format.args = list(big.mark = ","))
name | records | species | fish | vulnerable | vulnerable_fish | barcode_fish |
---|---|---|---|---|---|---|
Aldabra Atoll | 2,946 | 1,038 | 498 | 81 | 9 | 7 |
Archipiélago de Revillagigedo | 9,021 | 469 | 376 | 62 | 49 | 25 |
Area de Conservación Guanacaste | 1,135 | 359 | 345 | 15 | 12 | 8 |
Banc d’Arguin National Park | 1,224 | 44 | 23 | 5 | 4 | 3 |
Belize Barrier Reef Reserve System | 8,790 | 1,082 | 345 | 50 | 28 | 11 |
Brazilian Atlantic Islands: Fernando de Noronha and Atol das Rocas Reserves | 25,893 | 542 | 165 | 15 | 11 | 11 |
Cocos Island National Park | 2,429 | 531 | 408 | 55 | 50 | 29 |
Coiba National Park and its Special Zone of Marine Protection | 2,522 | 593 | 439 | 30 | 26 | 16 |
East Rennell | 27 | 9 | 6 | 1 | 1 | 1 |
Everglades National Park | 60,414 | 935 | 285 | 30 | 19 | 10 |
French Austral Lands and Seas | 165,351 | 1,458 | 277 | 25 | 6 | 18 |
Galápagos Islands | 60,148 | 1,628 | 729 | 89 | 67 | 36 |
Gough and Inaccessible Islands | 2,497 | 198 | 46 | 12 | 5 | 6 |
Great Barrier Reef | 1,309,086 | 12,915 | 2,834 | 238 | 93 | 127 |
Gulf of Porto: Calanche of Piana, Gulf of Girolata, Scandola Reserve | 113 | 21 | 2 | 5 | 1 | 1 |
Ha Long Bay | 1,967 | 326 | 318 | 8 | 8 | 4 |
Heard and McDonald Islands | 38,615 | 257 | 29 | 10 | 1 | 4 |
High Coast / Kvarken Archipelago | 16,510 | 128 | 19 | 2 | 2 | 1 |
Ibiza, Biodiversity and Culture | 72 | 35 | 14 | 2 | 1 | 2 |
iSimangaliso Wetland Park | 25,036 | 2,294 | 1,100 | 88 | 67 | 57 |
Islands and Protected Areas of the Gulf of California | 17,184 | 1,308 | 772 | 76 | 56 | 55 |
Kluane / Wrangell-St Elias / Glacier Bay / Tatshenshini-Alsek | 965 | 122 | 91 | 8 | 2 | 5 |
Komodo National Park | 682 | 363 | 217 | 12 | 10 | 8 |
Lagoons of New Caledonia: Reef Diversity and Associated Ecosystems | 30,368 | 4,539 | 871 | 44 | 16 | 13 |
Lord Howe Island Group | 39,412 | 1,774 | 484 | 45 | 15 | 18 |
Macquarie Island | 89,036 | 652 | 98 | 14 | 1 | 2 |
Malpelo Fauna and Flora Sanctuary | 8,421 | 395 | 269 | 50 | 38 | 24 |
Natural System of Wrangel Island Reserve | 295 | 63 | 11 | 7 | 1 | 1 |
New Zealand Sub-Antarctic Islands | 10,681 | 1,283 | 78 | 28 | 8 | 16 |
Ningaloo Coast | 346,971 | 3,195 | 927 | 104 | 52 | 64 |
Ogasawara Islands | 444 | 239 | 120 | 19 | 5 | 4 |
Papahānaumokuākea | 271,733 | 1,739 | 586 | 43 | 18 | 15 |
Pen |
4,479 | 215 | 6 | 8 | 1 | 1 |
Phoenix Islands Protected Area | 3,337 | 670 | 362 | 10 | 6 | 3 |
Puerto Princesa Subterranean River National Park | 1 | 1 | 1 | 1 | 1 | 1 |
Rock Islands Southern Lagoon | 22,417 | 2,232 | 1,062 | 53 | 16 | 7 |
Sanganeb Marine National Park and Dungonab Bay - Mukkawar Island Marine National Park | 643 | 150 | 9 | 9 | 2 | 2 |
Shark Bay, Western Australia | 13,290 | 1,347 | 538 | 49 | 17 | 32 |
Shiretoko | 291 | 59 | 57 | 2 | 1 | 1 |
Sian Ka’an | 2,513 | 226 | 110 | 30 | 9 | 11 |
Socotra Archipelago | 4,325 | 595 | 119 | 21 | 2 | 1 |
St Kilda | 7,674 | 394 | 31 | 9 | 3 | 3 |
Surtsey | 7 | 3 | 1 | 3 | 1 | 1 |
The Wadden Sea | 977,943 | 749 | 101 | 18 | 12 | 10 |
Tubbataha Reefs Natural Park | 209 | 63 | 9 | 9 | 5 | 5 |
Ujung Kulon National Park | 3 | 2 | 1 | 2 | 1 | 1 |
West Norwegian Fjords – Geirangerfjord and Nærøyfjord | 17 | 8 | 1 | 2 | 1 | 1 |
Whale Sanctuary of El Vizcaino | 1,044 | 168 | 124 | 12 | 11 | 18 |
subset <- site_stats %>%
filter(name %in% c("Great Barrier Reef", "Belize Barrier Reef Reserve System", "Ha Long Bay", "The Wadden Sea", "Heard and McDonald Islands", "Aldabra Atoll")) %>%
arrange(name)
subset$x <- c(600, 100, 800, 500, 10, 60)
subset$y <- c(140, 80, 210, 3, 20, 40)
ggplot() +
geom_segment(subset, mapping = aes(x = x, y = y, xend = fish, yend = vulnerable)) +
geom_label(subset, mapping = aes(x = x, y = y, label = name), size = 3.5, label.size = 0) +
geom_point(site_stats, mapping = aes(x = fish, y = vulnerable, size = species, color = species), shape = 21, stroke = 1.2, fill = "white") +
scale_x_continuous(trans = "log10") +
scale_y_continuous(trans = "log10") +
scale_size(range = c(1, 8), breaks = c(1000, 5000, 10000)) +
scale_color_viridis_c(end = 0.8, trans = "log10", breaks = c(1000, 5000, 10000)) +
guides(color = guide_legend(), size = guide_legend()) +
xlab("fish species") + ylab("vulnerable species") +
ft_theme() +
ggtitle("Fish and vulnerable species diversity at marine World Heritate sites")
Species lists by site
species_lists <- occ %>%
group_by(name, scientificName, aphiaID, phylum, class, order, family, genus, is_fish, is_coral, category, sequences) %>%
summarize() %>%
select(site = name, scientificName, aphiaID, phylum, class, order, family, genus, is_fish, is_coral, iucn_category = category, bold_sequences = sequences) %>%
arrange(site, phylum, class, order, family, scientificName) %>%
ungroup()
write.csv(species_lists, file = "output/species_lists.csv", row.names = FALSE, na = "")
xlsx::write.xlsx(as.data.frame(species_lists), file = "output/species_lists.xlsx", showNA = FALSE, row.names = FALSE)
xlsx::write.xlsx(as.data.frame(species_lists %>% filter(is_fish)), file = "output/species_lists_fish.xlsx", showNA = FALSE, row.names = FALSE)