3  Species distribution models

3.1 Introduction

MPA Europe will produce a priority map of conservation areas in Europe, designed to protect the highest proportion of biodiversity. Consequently, a critical aspect of the project is understanding species distributions. Despite the growing number of available records in OBIS and other databases, significant gaps remain in our knowledge of species ranges. To address this, MPA Europe is utilizing species distribution models (SDMs) to generate high-resolution (~5 km grid) species range maps.

SDMs are statistical models used to predict the potential geographic distribution of a species based on the relationship between observed species occurrences and environmental variables. These models estimate areas where environmental conditions are suitable for the presence of a species, producing maps of predicted habitat suitability1. You can find more details about SDMs in the review of Elith and Leathwick (2009), and more broadly ecological niche modelling in Peterson et al. (2011).

In a simplified look, species data (in the form of information of where the species occur, and ideally where it is absent) and environmental data (e.g. temperature, salinity) are used as input in a model (see example here). A statistical model is an abstraction2 that help us to understand the relationship between variables. Many algorithms can be used to produce a species distribution model, depending on the available data. Once the model is fitted, it can then be used to predict the suitability in new locations according to the local environmental conditions.

%%{
  init: {
    'themeVariables': {
      'lineColor': '#757575'
    }
  }
}%%

flowchart LR
    n1["Environmental data"]
    n2["Model"]
    n3["Species data"]
    n4["Range map"]
    classDef commonStyle color:#000000,fill:#D9D9D9,stroke:#737373,stroke-width:1px
    class n1,n3 commonStyle
    n1 --> n2
    style n2 fill:#016DD7,stroke:#016DD7,color:white
    n3 --> n2
    n2 --> n4
    style n4 fill:#00D0B1,stroke:#00D0B1,color:white
    linkStyle 0,1,2 stroke:#757575,stroke-width:0.5px,marker-width:0.5px

Of course, this is just a simplification. Many steps are necessary to obtain the data, process it, fit the model, evaluate the models and finally make predictions. In this project we followed those steps:

  1. Obtain species list (all species occurring in the study area)
  2. Obtain occurrence data for the selected species
  3. Apply quality control procedures to the occurrence data
  4. Obtain environmental data
  5. Obtain ecological information (in our case, the habitat of species, benthic or pelagic)
  6. Fit models
  7. Evaluate models
  8. Produce map predictions

Each of those are detailed on the next sections of this documentation. But first we detail the statistical framework adopted in the WP3.

3.2 Framework used in this project

All modeling was done considering a point process framework. Spatial Point Process Models (PPMs) are used to model any type of events that arise as points in space (Renner et al. 2015). Those points have random number and location, but are related to an underlying process. It was only recently that PPMs were more intensively applied to SDMs (Renner et al. 2015; Warton and Shepherd 2010), although widely used in spatial analysis (examples of applications range from disease mapping to detection of landslides; e.g. Lombardo, Opitz, and Huser (2018)).

The interest in PPMs for SDMs was mainly driven by the challenge of modelling presence-only data, when all the information available is the geographical locations where the species was recorded (Fithian and Hastie 2013). Usually, one would sample pseudo-absences (points that should reflect places where the species is absent), but the number and place of pseudo-absences can have a great influence on models (Barbet-Massin et al. 2012). Also there is no clear justification to pseudo-absences: the location was not sampled and thus it is impossible to really know if the species is absent. Instead, on a PPM framework, quadrature (or background) points are used only as a device to describe the available environmental conditions (Fithian and Hastie 2013). Points are sampled at random and the number of points can be chosen based on the accuracy of the likelihood estimate (Renner et al. 2015). In theory, all the points of the environment could be used, but this would increase computational time.

In the case of a PPM, and considering that the points does not present a strong bias, the resulting intensity of the point process can be used as a proxy to the suitability of the habitat (under the assumption that the species is more easily recorded where the habitat suitability is higher) (Renner et al. 2015). One important point to note is that the results of any PPM model (and in general, any presence-only model) should not be interpreted as probability of occurrence, but instead as a relative occurrence rate (following (Merow, Smith, and Silander 2013)).

PPMs are closely related to the widely used Maxent (Renner and Warton 2013). Indeed, even the pseudo-absence (or presence-background) modeling can approximate a Poisson point process model, under certain conditions (Renner et al. 2015; Warton and Shepherd 2010).

Barbet-Massin, Morgane, Frédéric Jiguet, Cécile Hélène Albert, and Wilfried Thuiller. 2012. “Selecting Pseudo-Absences for Species Distribution Models: How, Where and How Many?: How to Use Pseudo-Absences in Niche Modelling? Methods in Ecology and Evolution 3 (2): 327–38. https://doi.org/10.1111/j.2041-210X.2011.00172.x.
Elith, Jane, and John R. Leathwick. 2009. “Species Distribution Models: Ecological Explanation and Prediction Across Space and Time.” Annual Review of Ecology, Evolution, and Systematics 40 (1): 677–97. https://doi.org/10.1146/annurev.ecolsys.110308.120159.
Fithian, William, and Trevor Hastie. 2013. “Finite-Sample Equivalence in Statistical Models for Presence-Only Data.” The Annals of Applied Statistics 7 (4). https://doi.org/10.1214/13-AOAS667.
Lombardo, Luigi, Thomas Opitz, and Raphaël Huser. 2018. “Point Process-Based Modeling of Multiple Debris Flow Landslides Using INLA: An Application to the 2009 Messina Disaster.” Stochastic Environmental Research and Risk Assessment 32 (7): 2179–98. https://doi.org/10.1007/s00477-018-1518-0.
Merow, Cory, Matthew J. Smith, and John A. Silander. 2013. “A Practical Guide to MaxEnt for Modeling Species’ Distributions: What It Does, and Why Inputs and Settings Matter.” Ecography 36 (10): 1058–69. https://doi.org/10.1111/j.1600-0587.2013.07872.x.
Peterson, A. Townsend, Jorge Soberón, Richard G. Pearson, Robert P. Anderson, Enrique Martínez-Meyer, Miguel Nakamura, and Miguel B. Araújo. 2011. Ecological Niches and Geographic Distributions (MPB-49). 1st ed. Princeton University Press. https://doi.org/10.23943/princeton/9780691136868.001.0001.
Renner, Ian W., Jane Elith, Adrian Baddeley, William Fithian, Trevor Hastie, Steven J. Phillips, Gordana Popovic, and David I. Warton. 2015. “Point Process Models for Presence-Only Analysis.” Edited by Robert B. O’Hara. Methods in Ecology and Evolution 6 (4): 366–79. https://doi.org/10.1111/2041-210X.12352.
Renner, Ian W., and David I. Warton. 2013. “Equivalence of MAXENT and Poisson Point Process Models for Species Distribution Modeling in Ecology.” Biometrics 69 (1): 274–81. https://doi.org/10.1111/j.1541-0420.2012.01824.x.
Warton, David I., and Leah C. Shepherd. 2010. “Poisson Point Process Models Solve the Pseudo-Absence Problem for Presence-Only Data in Ecology.” The Annals of Applied Statistics 4 (3). https://doi.org/10.1214/10-AOAS331.

  1. Note that SDM is a broad term, covering many statistical approaches used to model the distribution of species according to environmental variables. Some methods are capable of detecting the probability of occurrence of a species, while others offer just a relative suitability score.↩︎

  2. This is an important aspect: models are designed to help us understand and capture general trends, providing insights into natural phenomena or enabling predictions. However, models are not perfect; many fine-scale aspects that contribute to the true (and often unknown) distribution of species are not fully represented.↩︎