Introduction to Occupancy Models

Occupancy models are a type of hierarchical model

Holup, what’s a hierarchical model?

Hierarchical models are a sequence of related probability models that are usually ordered by their conditional probability structure. This means that the probability structure in one part of the model is dependent on the probability structure in the previous part of the model. The parts of a hierarchical model are linked through conditionally dependent random variables, which is a confusing way of saying they are linked through something called latent variables.

Latent Variables

As ecologists, we often want to make inferences about the state of nature. What’s the distribution of Yellow Rumped Warblers? How many Spotted Skunks are on Santa Cruz Island? Those states of nature cannot be observed directly, but instead arise from processes that are hidden. If we were all-knowing Gods who could see where every Yellow Rumped Warbler was on earth at any given time we wouldn’t have a problem, but because we are lowly ecologists, we must make inferences about these unobservable states and hidden processes through quantities that we can observe. We refer to unobservable states of nature as latent, and we attempt to estimate unobservable, latent quantities in hierarchical models.

Why should you used an occupancy model?

Folks who use occupancy models are concerned with something called detection probability. Detection probability is the probability that you can detect a species given it is present at a site. If you wanted to model a species distribution without using an occupancy model, you would have to assume your ability to detect the species was absolutely perfect, because in a non-occupancy species distribution model, there is not way to account for less than perfect detection. This means that your estimates of species distribution will almost always be underestimated. Occupancy models account for imperfect detection by splitting estimates of occupancy and detection probability up into two sub-models (hence the hierarchical structure). There are other problems with not accounting for imperfect detection, but I will not get into those now (see Kery 2010 Chapter 20 for an excellent discussion).

An occupancy model has two sub-models:

  1. One part describes the thing we are interested in (some true state of nature)
  2. The second part describes the measurement error (this is where the actual data is and where detection probability is modeled)

Occupancy models: the natural choice for modeling species distributions

Let’s build an occupancy model

Imagine you are a bird biologist. You have 10 sites that you visit three times during the breeding season (May to July), and you record the presence or absence of Warbling Vireos at each site. You are interested in determining if vegetation height effects warbling vireo occupancy. You suspect that Warbling Vireos prefer taller vegetation.

Vireo gilvus

Vireo gilvus

Your data might look something like this where 1 = detected and 0 = undetected, and you visited each site 3 times.

library(tidyverse)
df1 <- read_csv("data/fake_data1.csv")

head(df1)
## # A tibble: 6 x 4
##   Site  `Visit 1` `Visit 2` `Visit 3`
##   <chr>     <dbl>     <dbl>     <dbl>
## 1 A             1         1         0
## 2 B             0         1         0
## 3 C             1         1         1
## 4 D             1         1         1
## 5 E             1         1         0
## 6 F             0         0         0

There is likely some error in your ability to detect a species though.

Here’s a fun example. Could you find the leopard if you were looking through your binoculars at this mountain side?