A unified approach to ecological modeling in Stan

Author
Published

May 5, 2024

Introduction

The development of Bayesian statistical software had considerable influence on statistical ecology because many ecological models are complex and require custom likelihoods that were not necessarily available in off-the-shelf software such as R or Program Mark (Cooch and White 2008, Kéry and Royle 2015, Kéry and Royle 2020). One of the reasons why ecological models are so complex is because the data are often messy. For instance, in something like a randomised control trial, careful design means you can make stronger assumptions such as that measurements are obtained without error. However, ecological data are noisy and we frequently require complicated observation models that account for imperfect detection of species or individuals. As a result, a large class of ecological models can be formulated as hidden Markov models (HMMs), where time-series data are modeled with a (partially or completely) unobserved ecological model and an observation model conditioned on it. Examples of ecological HMMs include mark-recapture, occupancy, and \(N\)-mixture models, including their more complex multistate variants.

HMMs are straightforward to model with non-gradient based Markov chain Monte Carlo (MCMC) methods because the latent ecological states can be sampled like parameters. Formulating these models became accessible to ecologists because of the simplicity afforded by statistical software like BUGS, JAGS, and NIMBLE (de Valpine et al. 2017). Consider the simple mark-recapture model where individuals \(i \in 1 : I\) first captured at time \(t = f_i\) survive between times \(t \in f_i: T\) with survival probability \(\phi\) and are recaptured on each occasion with detection probability \(p\). The state-space formulation of this model from \(t_{(f_i + 1):T}\) with vague priors for \(\phi\) and \(p\) is as follows:

\[ \begin{aligned} z_{i,t} &\sim \textrm{Bernoulli} \left( z_{i,t-1} \times \phi \right) \\ y_{i,t} &\sim \textrm{Bernoulli} \left( z_{i,t} \times p \right) \\ \phi, p &\sim \textrm{Beta} \left( 1, 1 \right) \end{aligned} \tag{1}\]

where \(z_{i,t}\) are the partially observed ecological states (where \(z=1\) is an alive individual and \(z=0\) is a dead individual) and \(y_{i,t}\) are the observed data (where \(y=1\) is an observed individual and \(y=0\) is an unobserved individual). Using NIMBLE 1.1.0 in R 4.3.2 (R Core Team 2023), coding this model is straightforward and follows the algebra above closely.

Code
library(nimble)
cmr_code <- nimbleCode({
  # likelihood
  for (i in 1:n_ind) {
    # initial state is known
    z[i, first[i]] <- y[i, first[i]]
    # subsequent surveys
    for (t in (first[i] + 1):n_surv) {
      z[i, t] ~ dbern(z[i, t - 1] * phi)
      y[i, t] ~ dbern(z[i, t] * p)
    } # t
  } # i
  # priors
  phi ~ dbeta(1, 1)
  p ~ dbeta(1, 1)
})

After simulating some data, creating a NIMBLE model from the code, and configuring the MCMC, we can see that each unknown z[i, t] gets its own binary MCMC sampler.

Code
# metadata
n_ind <- 100
n_surv <- 8

# parameters
phi <- 0.6
p <- 0.7

# containers
z <- y <- z_known <- matrix(NA, n_ind, n_surv)
first <- sample(1:(n_surv - 1), n_ind, replace = T) |> sort()
last <- numeric(n_ind)

# simulation
for (i in 1:n_ind) {
  z[i, first[i]] <- y[i, first[i]] <- 1
  for (t in (first[i] + 1):n_surv) {
    z[i, t] <- rbinom(1, 1, z[i, t - 1] * phi)
    y[i, t] <- rbinom(1, 1, z[i, t] * p)
  } # t
  last[i] <- max(which(y[i, ] == 1))
  z_known[i, first[i]:last[i]] <- 1
} # i

# create NIMBLE model object
Rmodel <- nimbleModel(cmr_code, 
                      constants = list(n_ind = n_ind, n_surv = n_surv, first = first),
                      data = list(y = y, z = z_known))

# configure MCMC
conf <- configureMCMC(Rmodel)
===== Monitors =====
thin = 1: p, phi
===== Samplers =====
RW sampler (2)
  - phi
  - p
binary sampler (263)
  - z[]  (263 elements)

Note that because we know with certainty that individuals were alive between the first and last survey they were captured, we can actually ease computation by supplying those z[i, first[i]:last[i]] as known. We’ll run this model to confirm that it’s able to recover our input parameters using MCMCvis (Youngflesh 2018).

Code
# compile and build MCMC
Cmodel <- compileNimble(Rmodel)
Cmcmc <- buildMCMC(conf) |> compileNimble(project = Cmodel, resetFunctions = T)

# fit model
n_iter <- 1e3 ; n_chains <- 4
fit_cmr_nimble <- runMCMC(Cmcmc, niter = n_iter * 2, nburnin = n_iter, nchains = n_chains)
Code
# summarise
library(MCMCvis)
summary_cmr_nimble <- MCMCsummary(fit_cmr_nimble, params = c("phi", "p")) |>
  as_tibble(rownames = "variable") |>
  mutate(truth = c(phi, p))
summary_cmr_nimble
# A tibble: 2 × 9
  variable  mean     sd `2.5%` `50%` `97.5%`  Rhat n.eff truth
  <chr>    <dbl>  <dbl>  <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>
1 phi      0.660 0.0360  0.586 0.660   0.727  1.02   633   0.6
2 p        0.764 0.0487  0.669 0.765   0.861  1.02   429   0.7

As flexible as Bayesian software like BUGS and JAGS are, most of them do not use Hamiltonian Monte Carlo (HMC, Neal 1994) to generate samples from the posterior distribution, but see nimbleHMC for a recent implementation in NIMBLE. Without going into detail myself (but see Betancourt (2018)), HMC is generally considered a superior algorithm and moreover gives warnings when something goes wrong in the sampling, alerting the user to potential issues in the estimation. As a result, HMC should generally be preferred by practitioners but it has one trait that can be challenging to ecologists in particular: HMC requires that all parameters are continuous, meaning that our typical state-space formulations of HMMs cannot be fit with HMC due to the presence of latent discrete parameters like our ecological states z[i, t] above. The remainder of this post will focus on dealing with fitting complex ecological models using Stan (Carpenter et al. 2017), a probabilistic programming language which implements a state-of-the-art No U-Turn Sampler (NUTS, Hoffman and Gelman 2014). I present a unified approach that is applicable to simple and complex models alike, highlighted with three examples of increasing complexity.

Marginalisation

Since HMC cannot sample discrete parameters, we have to reformulate our models without the latent states through a process called marginalisation. Marginalisation essentially just counts the mutually exclusive ways an observation can be made and sums these probabilities, as \(\Pr(A \cup B) = \Pr(A) + \Pr(B)\). For instance, in the above mark-recapture example, consider an individual was captured on occasion \(t_1\) and recaptured on occasion \(t_2\). The only way this could have happened is that the individual survived the interval \(t_{1:2}\) with probability \(\phi\) and was recaptured on occasion \(t_2\) with probability \(p\), making the marginal likelihood of that datapoint \(\phi \times p\). However, if it was not observed at time \(t_{2}\), two things are possible: either the animal survived but was not recaptured with probability \(\phi \times (1 - p)\) or the individual died with probability \(1 - \phi\) and was thus not recaptured. Summing these probabilities gives us the marginal likelihood for that datapoint, which is \(\phi \times (1 - p) + (1 - \phi)\). Models can be marginalised in many different ways, but in this post I’m going to focus on the forward algorithm. Although it takes some getting used to, the forward algorithm facilitates fitting a wide variety of (increasingly complex) HMMs.

Forward algorithm

First, I’m going to restructure our mark-recapture model with the introduction of transition probability matrices (TPMs). Formulating ecological models this way means we can apply the same strategy for a wide variety of models. In the mark-recapture example, we still have two latent ecological states: (1) dead and (2) alive. The transitions from one state to the other is given by the following TPM, where the states of departure (time \(t - 1\)) are in the rows and states of arrival (time \(t\)) are in the columns (note that the rows of TPMs must sum to 1):

\[ \mathrm{TPM}_z = \begin{bmatrix} 1 & 0 \\ 1 - \phi & \phi \end{bmatrix} \tag{2}\]

Here, the dead state is an absorbing state and individuals remain in the alive states with probability \(\phi\). The observation process still deals with two observed states, (1) not detected and (2) detected, and is also formulated as a \(2 \times 2\) TPM, where the ecological states are in the rows (states of departure) and the observed states in the columns (states of arrival):

\[ \mathrm{TPM}_y = \begin{bmatrix} 1 & 0 \\ 1 - p & p \end{bmatrix} \tag{3}\]

The forward algorithm works by computing the marginal likelihood of an individual being in each state \(s \in 1 : S\) at time \(t\) given the observations up to time \(t\), stored in a \(T \times S\) matrix we’ll call \(\mathrm{MPS}\) (Marginal Probability per State). From here on, the notation for subsetting row \(i\) of a matrix \(M\) is \(M_i\) and \(M_{:j}\) for subsetting column \(j\). The forward algorithm starts from initial state probabilities, which in the mark-recapture example are known, given that we know with certainty that every individual is alive at first capture:

\[ \mathrm{MPS}_{f_i} = \left[ 0, 1 \right] \]

For subsequent occasions after an individual’s first capture \(t_{(f_i+1):T}\), the marginal probabilities of being in state \(s\) up to time \(t\) given the observations that came before \(t\) are computed as:

\[ \mathrm{MPS}_{t} = \mathrm{MPS}_{t-1} \times \mathrm{TPM}_z \odot {\mathrm{TPM}_y}_{[:y_{i, t}]}^\intercal \]

That is, matrix multiplying the marginal probabilities at time \(t-1\) with the ecological TPM and then element-wise multiplying (where \(\odot\) denotes the Hadamard or element-wise product) the resulting row-vector with the transposed relevant column of the observation TPM, where the first column is used if the individual was not observed (\(y_{i,t} = 1\)) and the second column if it was observed (\(y_{i,t} = 2\)). The forward algorithm continues until survey \(T\) after which the log density is incremented with the log of the sum of \(\mathrm{MPS}_T\), the probabilities of being in each state at the end of the study period.

Example 1: Basic mark-recapture as a HMM

As a first example of structuring ecological models as HMMs with matrix multiplication, I have translated the initial mark-recapture model for use in Stan. I wrote two versions, one with the probabilities which most closely resembles the above algebra and one with log probabilities, which is what Stan naturally works with. Doing so involves swapping the multiplication of probabilities with summing the log probabilities, changing log(sum()) to log_sum_exp(), and performing matrix multiplication on the log scale with a custom function log_product_exp(), which I found on Stack Overflow (note that the function is overloaded to account for the dimensions of your two matrices). I have a few comments:

  • For computational efficiency with column-major ordering in Stan (also true for R, BUGS, and NIMBLE, for that matter), the above linear algebra operations are happening in reverse with a transposed \(\mathrm{TPM}_z\) and \(\mathrm{MPS}\). The ecological TPM is still constructed as above but transposed afterwards.
  • I am getting into the habit of initialising \(\mathrm{MPS}_t\) with the probabilities associated with the observation process (that is, the lowest hierarchical level of the HMM), as this is the only part of the model that is directly conditioned on the observed data. These marginal detection probabilities can then be incremented with the ecological process, where the .*= operator in Stan element-wise multiplies mps[:, t] with the expression on the right of the operator. Structuring even the simplest model this way makes understanding the more complicated ones coming up more manageable.
  • Just like in the NIMBLE version, we don’t have to worry about the probabilities associated with being dead between the first and last capture. Note that for the occasions between first and last capture, I only compute the marginal probabilities of the alive state, fixing those of the dead state to 0 (or negative_infinity() for log).
  • Even though these models don’t have latent ecological states as discrete parameters, they can still be derived using the Viterbi algorithm. This is performed in the generated quantities block, and for details I recommend studying the Stan User’s Guide. Note that in the simple mark-recapture example we already know that individuals were alive in state 2 between the first and last occasion of capture, so the Viterbi algorithm is only implemented from (last[i] + 1):n_surv.
  • In all models, I also compute the log likelihood in the vector log_lik for potential use in model checking using the loo package (Vehtari et al. 2023).
  • Most parts of the model block are repeated in generated quantities. Although this makes the whole program more verbose, it is more efficient to compute derived quantities in the generated quantities block as they are only computed once per HMC iteration, whereas the transformed parameters and the model blocks get updated with each gradient evaluation, of which there can be many per iteration.

Stan programs

Code
data {
  int<lower=1> n_ind, n_surv;
  array[n_ind] int<lower=1, upper=n_surv-1> first;
  array[n_ind] int<lower=first, upper=n_surv> last;
  array[n_ind, n_surv] int<lower=1, upper=2> y;
}

parameters {
  real<lower=0, upper=1> phi, p;
}

model {
  // TPM containers
  matrix[2, 2] tpm_z, tpm_y;
  
  // ecological TPM (transpose once)
  tpm_z[1, 1] = 1;
  tpm_z[1, 2] = 0;
  tpm_z[2, 1] = 1 - phi;
  tpm_z[2, 2] = phi;
  tpm_z = tpm_z';
  
  // observation TPM
  tpm_y[1, 1] = 1;
  tpm_y[1, 2] = 0;
  tpm_y[2, 1] = 1 - p;
  tpm_y[2, 2] = p;
  
  // marginal probabilities per state
  matrix[2, n_surv] mps;
  
  // likelihood
  for (i in 1:n_ind) {
    
    // initial state probabilities
    mps[:, first[i]] = [ 0, 1 ]';
    
    // from first to last capture, only alive states conditioned on data
    if (first[i] < last[i]) {
      for (t in (first[i] + 1):last[i]) {
        
        // initialise marginal probability with observation process
        mps[2, t] = tpm_y[2, y[i, t]];
        
        // increment with ecological process
        mps[2, t] *= tpm_z[2, 2] * mps[2, t - 1];
      } // t
    }
    
    // ensure dead state at last capture is impossible
    mps[1, last[i]] = 0;
    
    // if individual was not captured on last survey
    if (last[i] < n_surv) {
      
      // after last capture, condition both states on data
      for (t in (last[i] + 1):n_surv) {
        
        // initialise marginal probabilities with observation process
        mps[:, t] = tpm_y[:, y[i, t]];
        
        // increment with ecological process (matrix multiplication)
        mps[:, t] .*= tpm_z * mps[:, t - 1];
      } // t
    }
    
    // increment log density
    target += log(sum(mps[:, n_surv]));
    
  } // i
  
  // priors
  target += beta_lupdf(phi | 1, 1);
  target += beta_lupdf(p | 1, 1);
}

generated quantities {
  array[n_ind, n_surv] int z;
  vector[n_ind] log_lik;
  {
    matrix[2, 2] tpm_z, tpm_y;
    tpm_z[1, 1] = 1;
    tpm_z[1, 2] = 0;
    tpm_z[2, 1] = 1 - phi;
    tpm_z[2, 2] = phi;
    tpm_z = tpm_z';
    tpm_y[1, 1] = 1;
    tpm_y[1, 2] = 0;
    tpm_y[2, 1] = 1 - p;
    tpm_y[2, 2] = p;
    array[2, n_surv] int back_ptr;
    matrix[2, n_surv] mps, best_p;
    real tmp;
    int tt;
    
    for (i in 1:n_ind) {
    
      // log-lik
      mps[:, first[i]] = [ 0, 1 ]';
      if (first[i] < last[i]) {
        for (t in (first[i] + 1):last[i]) {
          mps[2, t] = tpm_y[2, y[i, t]];
          mps[2, t] *= tpm_z[2, 2] * mps[2, t - 1];
        } // t
      }
      mps[1, last[i]] = 0;
      
      // Viterbi
      best_p[:, last[i]] = [ 0, 1 ]';
      if (last[i] < n_surv) {
        best_p[:, (last[i] + 1):n_surv] = rep_matrix(0, 2, n_surv - last[i]);
        for (t in (last[i] + 1):n_surv) {
          // marginal observation probabilities
          mps[:, t] = tpm_y[:, y[i, t]];
          for (s_a in 1:2) {  // state of arrival
            for (s_d in 1:2) { // state of departure
              tmp = best_p[s_d, t - 1] * tpm_z[s_a, s_d] * mps[s_a, t];
              if (tmp > best_p[s_a, t]) {
                back_ptr[s_a, t] = s_d;
                best_p[s_a, t] = tmp;
              }
            } // s_d
          } // s_a
          
          // increment with ecological process for log-lik
          mps[:, t] .*= tpm_z * mps[:, t - 1];
        } // t
      }
      log_lik[i] = log(sum(mps[:, n_surv]));
      
      // ecological states
      z[i, first[i]:last[i]] = rep_array(2, last[i] - first[i] + 1);
      if (last[i] < n_surv) {
        tmp = max(best_p[:, n_surv]);
        for (s_a in 1:2) {
          if (best_p[s_a, n_surv] == tmp) {
            z[i, n_surv] = s_a;
          }
        } // s_a
        if (last[i] < (n_surv - 1)) {
          for (t in (last[i] + 1):(n_surv - 1)) {
            tt = n_surv - t + last[i] + 1;
            z[i, tt - 1] = back_ptr[z[i, tt], tt];
          } // t
        }
      }
    } // i
  }
}
Code
functions{
  /**
   * Return the natural logarithm of the product of the element-wise exponentiation of the specified matrices
   *
   * @param a  First matrix or (row_)vector
   * @param b  Second matrix or (row_)vector
   *
   * @return   log(exp(a) * exp(b))
   */
  matrix log_product_exp(matrix a, matrix b) {
    int x = rows(a);
    int y = cols(b);
    int z = cols(a);
    matrix[z, x] a_tr = a';
    matrix[x, y] c;
    for (j in 1:y) {
      for (i in 1:x) {
        c[i, j] = log_sum_exp(a_tr[:, i] + b[:, j]);
      }
    }
    return c;
  }
  vector log_product_exp(matrix a, vector b) {
    int x = rows(a);
    int z = cols(a);
    matrix[z, x] a_tr = a';
    vector[x] c;
    for (i in 1:x) {
      c[i] = log_sum_exp(a_tr[:, i] + b);
    }
    return c;
  }
  row_vector log_product_exp(row_vector a, matrix b) {
    int y = cols(b);
    vector[size(a)] a_tr = a';
    row_vector[y] c;
    for (j in 1:y) {
      c[j] = log_sum_exp(a_tr + b[:, j]);
    }
    return c;
  }
  real log_product_exp(row_vector a, vector b) {
    real c = log_sum_exp(a' + b);
    return c;
  }
}

data {
  int<lower=1> n_ind, n_surv;
  array[n_ind] int<lower=1, upper=n_surv-1> first;
  array[n_ind] int<lower=first, upper=n_surv> last;
  array[n_ind, n_surv] int<lower=1, upper=2> y;
}

parameters {
  real<lower=0, upper=1> phi, p;
}

model {
  // log TPM containers
  matrix[2, 2] ltpm_z, ltpm_y;
  
  // ecological ltpm
  ltpm_z[1, 1] = 0;
  ltpm_z[1, 2] = negative_infinity();
  ltpm_z[2, 1] = log1m(phi);
  ltpm_z[2, 2] = log(phi);
  ltpm_z = ltpm_z';
  
  // observation ltpm
  ltpm_y[1, 1] = 0;
  ltpm_y[1, 2] = negative_infinity();
  ltpm_y[2, 1] = log1m(p);
  ltpm_y[2, 2] = log(p);
  
  // marginal log probabilities per state
  matrix[2, n_surv] lmps;
  
  // likelihood
  for (i in 1:n_ind) {
    
    // initial state log probabilities
    lmps[:, first[i]] = [ negative_infinity(), 0 ]';
    
    // from first to last capture, only alive states conditioned on data
    if (first[i] < last[i]) {
      for (t in (first[i] + 1):last[i]) {
        
        // initialise marginal log probability with observation process
        lmps[2, t] = ltpm_y[2, y[i, t]];
        
        // increment with ecological process
        lmps[2, t] += ltpm_z[2, 2] + lmps[2, t - 1];
      } // t
    }
    
    // ensure dead state at last capture is impossible
    lmps[1, last[i]] = negative_infinity();
    
    // if individual was not captured on last survey
    if (last[i] < n_surv) {
      
      // after last capture, condition both states on data
      for (t in (last[i] + 1):n_surv) {
        
        // initialise marginal log probabilities with observation process
        lmps[:, t] = ltpm_y[:, y[i, t]];
        
        // increment with ecological process (log matrix multiplication)
        lmps[:, t] += log_product_exp(ltpm_z, lmps[:, t - 1]);
      } // t
    }
    
    // increment log density
    target += log_sum_exp(lmps[:, n_surv]);
    
  } // i
  
  // priors
  target += beta_lupdf(phi | 1, 1);
  target += beta_lupdf(p | 1, 1);
}

generated quantities {
  array[n_ind, n_surv] int z;
  vector[n_ind] log_lik;
  {
    matrix[2, 2] ltpm_z, ltpm_y, trm;
    ltpm_z[1, 1] = 0;
    ltpm_z[1, 2] = negative_infinity();
    ltpm_z[2, 1] = log1m(phi);
    ltpm_z[2, 2] = log(phi);
    ltpm_z = ltpm_z';
    ltpm_y[1, 1] = 0;
    ltpm_y[1, 2] = negative_infinity();
    ltpm_y[2, 1] = log1m(p);
    ltpm_y[2, 2] = log(p);
    array[2, n_surv] int back_ptr;
    matrix[2, n_surv] lmps, best_lp;
    real tmp;
    int tt;
    
    for (i in 1:n_ind) {
      
      // log-lik
      lmps[:, first[i]] = [ negative_infinity(), 0 ]';
      if (first[i] < last[i]) {
        for (t in (first[i] + 1):last[i]) {
          lmps[2, t] = ltpm_y[2, y[i, t]];
          lmps[2, t] += ltpm_z[2, 2] + lmps[2, t - 1];
        } // t
      }
      lmps[1, last[i]] = negative_infinity();
      
      // Viterbi
      best_lp[:, last[i]] = [ negative_infinity(), 0 ]';
      if (last[i] < n_surv) {
        best_lp[:, (last[i] + 1):n_surv] = rep_matrix(negative_infinity(), 2, n_surv - last[i]);
        for (t in (last[i] + 1):n_surv) {
          // marginal observation log probabilities
          lmps[:, t] = ltpm_y[:, y[i, t]];
          for (s_a in 1:2) {  // state of arrival
            for (s_d in 1:2) { // state of departure
              tmp = best_lp[s_d, t - 1] + ltpm_z[s_a, s_d] + lmps[s_a, t];
              if (tmp > best_lp[s_a, t]) {
                back_ptr[s_a, t] = s_d;
                best_lp[s_a, t] = tmp;
              }
            } // s_d
          } // s_a
          
          // increment with ecological process for log-lik
          lmps[:, t] += log_product_exp(ltpm_z, lmps[:, t - 1]);
        } // t
      }
      log_lik[i] = log_sum_exp(lmps[:, n_surv]);
      
      // ecological states
      z[i, first[i]:last[i]] = rep_array(2, last[i] - first[i] + 1);
      if (last[i] < n_surv) {
        tmp = max(best_lp[:, n_surv]);
        for (s_a in 1:2) {
          if (best_lp[s_a, n_surv] == tmp) {
            z[i, n_surv] = s_a;
          }
        } // s_a
        if (last[i] < (n_surv - 1)) {
          for (t in (last[i] + 1):(n_surv - 1)) {
            tt = n_surv - t + last[i] + 1;
            z[i, tt - 1] = back_ptr[z[i, tt], tt];
          } // t
        }
      }
    } // i
  }
}

Why we like HMC

Using CmdStanR 0.7.1 to call CmdStan 2.34.1 (Gabry et al. 2023), we’ll just run the log probability model for the same number of iterations as the NIMBLE version, and we see that there were no issues with sampling (Stan would tell us otherwise) and the effective sample sizes (ESS) were several times higher than for the NIMBLE model. This is why we want to use HMC.

Code
# data for Stan
cmr_data <- list(n_ind = n_ind, n_surv = n_surv, 
                 first = first, last = last, 
                 y = y + 1) |>
  # no NAs in Stan
  sapply(\(x) replace(x, is.na(x), 1))

# run HMC
fit_cmr <- cmr_lp$sample(data = cmr_data, refresh = 0,
                         chains = n_chains, parallel_chains = n_chains, iter_warmup = n_iter, iter_sampling = n_iter)
Running MCMC with 4 parallel chains...

Chain 1 finished in 2.3 seconds.
Chain 2 finished in 2.5 seconds.
Chain 3 finished in 2.5 seconds.
Chain 4 finished in 2.4 seconds.

All 4 chains finished successfully.
Mean chain execution time: 2.4 seconds.
Total execution time: 2.6 seconds.
Code
# summary
library(tidyverse)
fit_cmr$summary(c("phi", "p")) |> 
  select(variable, median, contains("q"), contains("ess")) |>
  mutate(ess_nimble = summary_cmr_nimble$n.eff,
         truth = c(phi, p))
# A tibble: 2 × 8
  variable median    q5   q95 ess_bulk ess_tail ess_nimble truth
  <chr>     <dbl> <dbl> <dbl>    <dbl>    <dbl>      <dbl> <dbl>
1 phi       0.662 0.599 0.720    2325.    2402.        633   0.6
2 p         0.763 0.673 0.840    2163.    2446.        429   0.7

And just to confirm, we check if the Viterbi algorithm was able to closely recover the latent alive states.

Code
# generate estimated z matrix
z_est <- fit_cmr$summary("z")$median |>
  matrix(n_ind, n_surv) |> 
  {\(z) ifelse(z < 1, NA, ifelse(z == 1, 0, 1))}()

# what proportion of latent z states correspond with the truth?
n_unknown <- n_surv - last
correct <- numeric(n_ind)
for (i in 1:n_ind) {
  if (n_unknown[i] > 0) {
    unknowns <- (last[i] + 1):n_surv
    correct[i] <- sum((z_est == z)[i, unknowns])
  }
}
sum(correct) / sum(n_unknown)
[1] 0.966

Example 2: Dynamic multistate occupancy model

In order to highlight the generality of the above modeling approach, we’re going to switch gears to a dynamic occupancy model. These models are also just HMMs, with some trivial differences to the basic mark-recapture model that can seem daunting at first:

  • There can be more than two states.
  • The initial state probabilities here are unknown, that is, we don’t know the occupancy state of a site when it’s first surveyed.
  • State transitions can happen in more than one direction, whereas dead individuals in mark-recapture can never transition to alive.

Consider a study area containing sites \(i \in 1 : I\) surveyed across \(t \in 1 : T\) years. We are investigating whether sites are occupied by tiger snakes (Notechis scutatus) in each year, but want to differentiate between sites that simply support tiger snakes or those that support breeding. Our ecological process thus has three possible states: (1) unoccupied by tiger snakes, (2) occupied by tiger snakes but not for breeding, or (3) occupied by breeding tiger snakes. The observation process also has three possible states: (1) no snakes found, (2) snakes found but no evidence of breeding, and (3) snakes found with evidence of breeding, as determined by the discovery of gravid females. Instead of directly formulating the ecological TPM here, we’ll do it in continuous time instead using a transition rate matrix (TRM).

A rainforest tiger snake (Notechis scutatus) from northern New South Wales, Australia.
Ecology occurs in continuous time

TPMs assume equal intervals between surveys, but this is the exception rather than the norm for ecological data. Although things like survival probabilities can easily be exponentiated to the time interval, this doesn’t work for processes that are not absorbing states, i.e., snakes moving in and out of sites. Unlike individual survival, sites are free to transition between states from year to year. Even with simpler models, however, it often makes more sense to model the survival process with mortality hazard rates (where the survival probability \(S\) is related to the mortality hazard rate \(h\) as \(S = \exp(-h)\)) for a number of reasons (Ergon et al. 2018). The multistate extension of this is straightforward, where we instead take the matrix exponential of a TRM (Glennie et al. 2022). By constructing our transition matrices as TRMs, where the off-diagonals contain instantaneous hazard rates and the diagonals contain negative sums of these off-diagonals (ensuring rows sum to 0, not 1), we can generate time-specific TPMs by taking the matrix exponential of the TRM multiplied by the time interval between occasions (\(\tau_t\)), thereby essentially marginalising over possible state transitions that occurred during survey intervals:

\[ \mathrm{TPM}_t = \exp \left( \mathrm{TRM} \times \tau_t \right) \tag{4}\]

A TRM for the tiger snake example might look something like this, with again states of departure (year \(t-1\)) in the rows and states of arrival (year \(t\)) in the columns:

\[ \mathrm{TRM}_z = \begin{bmatrix} -(\gamma_1 + \gamma_2) & \gamma_1 & \gamma_2 \\ \epsilon_1 & -(\epsilon_1 + \psi_1) & \psi_1 \\ \epsilon_2 & \psi_2 & -(\epsilon_2 + \psi_2) \end{bmatrix} \tag{5}\] with the following parameters:

  • \(\gamma_1\): the “non-breeding” colonisation rate, or the rate at which a site becomes occupied with non-breeding snakes at time \(t\) when the site was not occupied at time \(t-1\).
  • \(\gamma_2\): the “breeding” colonisation rate, or the rate at which a site becomes occupied with breeding snakes at time \(t\) when the site was not occupied at time \(t-1\).
  • \(\epsilon_1\): the “non-breeding” emigration rate, or the rate at which a site becomes onoccupied at time \(t\) when it was occupied with non-breeding snakes at time \(t-1\).
  • \(\epsilon_2\): the “breeding” emigration rate, or the rate at which a site becomes onoccupied at time \(t\) when it was occupied with breeding snakes at time \(t-1\).
  • \(\psi_1\): the breeding rate, or the rate at which a site becomes used for breeding at time \(t\) when the site was occupied but not used for breeding at time \(t-1\).
  • \(\psi_2\): the breeding cessation rate, or the rate at which a site stops being used for breeding at time \(t\) when the site was used for breeding at time \(t-1\).

Note that even if these parameters aren’t exactly the quantities of interest, derivatives of these parameters are easily derived as generated quantities after parameter estimation. Of course, these parameters can be modeled more complexly with linear or non-linear effects at the level of years, sites, or both, with the only changes being that the TRMs are constructed at those levels (for instance, by creating site- and year-level TRMs where the parameters populating the TRM are allowed to vary at that level—see Example 3 for an application).

In addition to the TRM/TPM, the ecological process is initiated on the first occasion with an initial state vector, giving the probabilities of being in each state at \(t_1\), which have to sum to 1:

\[ \boldsymbol{\eta} = \left[ \eta_1 , \eta_2 , \eta_3 \right] \tag{6}\]

Simulating continuous time

Below, I’ll simulate the ecological data by specifying a TRM and using the expm package to take the matrix exponential to generate the TPM (Goulet et al. 2021). In order to simulate ecological states z[i, t] we use the categorical distribution, which is just the Bernoulli distribution generalised to multiple outcomes. Note that because we’re simulating the ecological process from year to year, the time intervals \(\tau\) are just 1; however, in the Stan program below, we will account for missing years using tau.

Code
# metadata
n_site <- 100
n_year <- 10
n_sec <- 4

# parameters
eta <- c(NA, 0.3, 0.4)       # unoccupied/non-breeding/breeding initial probabilities
eta[1] <- 1 - sum(eta[2:3])  # ensure probabilities sum to 1 (simplex) 
gamma <- c(0.3, 0.2)         # non-breeding/breeding colonisation rates
epsilon <- c(0.4, 0.1)       # non-breeding/breeding emigration rates
psi <- c(0.4, 0.3)           # breeding/breeding cessation rates

# function for random categorical draw
rcat <- function(n, prob) {
  return(
    rmultinom(n, size = 1, prob) |> 
      apply(2, \(x) which(x == 1))
  )
}

# TRM
trm <- matrix(c(-(gamma[1] + gamma[2]), gamma[1], gamma[2],
                epsilon[1], -(epsilon[1] + psi[1]), psi[1],
                epsilon[2], psi[2], -(epsilon[2] + psi[2])),
              3, byrow = T)
trm
     [,1] [,2] [,3]
[1,] -0.5  0.3  0.2
[2,]  0.4 -0.8  0.4
[3,]  0.1  0.3 -0.4
Code
# here's the TPM
library(expm)
tpm_z <- expm(trm)
tpm_z
      [,1]  [,2]  [,3]
[1,] 0.650 0.182 0.168
[2,] 0.231 0.515 0.254
[3,] 0.101 0.182 0.717
Code
# containers
z <- matrix(NA, n_site, n_year)

# ecological process simulation
for (i in 1:n_site) {
  
  # initial states
  z[i, 1] <- rcat(1, eta)
  
  # subsequent years
  for (t in 2:n_year) {
    z[i, t] <- rcat(1, tpm_z[z[i, t - 1], ])
  } # t
} # i

# some ecological states
head(z, 10)
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    1    1    3    1    2    2    3    1    1     1
 [2,]    3    2    3    3    3    3    3    3    2     2
 [3,]    2    2    2    2    3    3    3    3    3     3
 [4,]    2    2    2    2    2    3    1    1    1     1
 [5,]    1    1    1    1    1    1    2    3    3     3
 [6,]    1    2    2    3    1    1    2    3    1     1
 [7,]    2    2    3    3    3    3    1    2    2     1
 [8,]    3    3    3    3    3    3    3    2    3     3
 [9,]    1    2    2    3    3    2    2    2    1     3
[10,]    3    2    2    1    1    1    2    1    1     1

Observation process

Although possible to construct in continuous time, often our observation model makes more sense in discrete time, especially if specific surveys were conducted instead of with something like an automated recording device. For instance, with dynamic occupancy models, an investigator would conduct multiple “secondary” surveys within a season or year in order to disentangle the ecological process from the observation process.

Robust Design

With multistate models where individuals or sites can transition between ecological states, in order to reliably estimate transitions rates and detection probabilities we need to conduct multiple consecutive surveys within periods of assumed closure, yielding secondary surveys nested within primary occasions. In the absence of this, there are parameter identifiability issues for the state transition rates and state-specific detection probabilities. This survey strategy was originally called the “robust design” (Pollock 1982) and the necessity of applying it for multistate models more generally has gone largely unappreciated by ecologists.

As in the mark-recapture example, we construct an observation TPM where the ecological states are in the rows (states of departure) and observed states in the columns (states of arrival), recognising that there’s uncertainty in being able to detect breeding:

\[ \mathrm{TPM}_y = \begin{bmatrix} 1 & 0 & 0 \\ 1 - p_1 & p_1 & 0 \\ 1 - p_2 & p_2 \times (1 - \delta) & p_2 \times \delta \end{bmatrix} \tag{7}\] with the following parameters:

  • \(p_{1:2}\): the state-specific detection probabilities, or the probabilities of detecting tiger snakes if they are present but not breeding (\(p_1\)) or present and breeding (\(p_2\)).
  • \(\delta\): the breeding detection probability, or the probability of detecting tiger snake breeding conditional on it occurring at a site. Note that in this example, breeding is detected when gravid females are found. Note that false-positive detections of any kind (erroneous snake identification, etc.) are not considered, but will be in Example 3.

I’ll simulate some observational data below where the investigators aimed to conduct 4 secondary surveys per year (primary occasion), all conducted during the tiger snake breeding season, with some secondary surveys missed, and some years missed altogether.

Code
# parameters
p <- c(0.4, 0.5)  # non-breeding/breeding detection probabilities
delta <- 0.6      # breeding detection probability

# TPM
tpm_y <- matrix(c(1, 0, 0,
                  1 - p[1], p[1], 0,
                  1 - p[2], p[2] * (1 - delta), p[2] * delta),
                3, byrow = T)

# containers
n_year_i <- numeric(n_site)
years <- n_sec_it <- matrix(NA, n_site, n_year)
secs <- array(NA, c(n_site, n_year, n_sec))
y <- array(NA, c(n_site, n_year, n_sec))

# simulation
for (i in 1:n_site) {
  
  # number of years and sequence of years surveyed (always surveyed in first year)
  n_year_i[i] <- 1 + rbinom(1, n_year - 1, 0.8)
  years[i, 1:n_year_i[i]] <- c(1, sample(2:n_year, n_year_i[i] - 1) |> sort())
  
  for (t in years[i, 1:n_year_i[i]]) {
    
    # number of secondaries per site and year and sequence of secondaries surveyed
    n_sec_it[i, t] <- 1 + rbinom(1, n_sec - 1, 0.8)
    secs[i, t, 1:n_sec_it[i, t]] <- sample(1:n_sec, n_sec_it[i, t]) |> sort()
    
    # observation process
    y[i, t, secs[i, t, 1:n_sec_it[i, t]]] <- rcat(n_sec_it[i, t], tpm_y[z[i, t], ])
  } # t
} # i

I recognise that some of the indexing here is daunting but it just makes sure that our data is contained in the right parts of the detection arrays and that Stan isn’t going to have to deal with NAs in the data, which is less straightfoward than it is in JAGS and NIMBLE.

Indexing

In JAGS and NIMBLE we don’t really have to worry about NAs because MCMC samplers are assigned to unobserved data, but that is inefficient. Things aren’t so simple in Stan and we’re better off just pointing the model to where the observations sit in data arrays. In the above example, n_year_i is an n_site length vector holding the number of surveyed years per site. years is an n_site * n_year matrix with the sequence of years observed, which for n_year_i[i] = 6 might looks like years[i, ] = [1, 3, 4, 6, 7, 8, NA, NA, NA, NA]. The NAs will have to be replaced with some other value by the time they get fed to Stan, but you don’t have to worry about them because they won’t get used anyway. n_sec is the maximum number of secondary surveys, and n_sec_it[i, t] holds the number of secondary surveys conducted per site and year. Just like years, secs holds the sequence of specific secondaries that were surveyed for that site and year. Putting it together, in each value t of years[i, 1:n_year_i[i]], y[i, t, secs[i, t, 1:n_sec_it[i, t]]] selects the observations of a site only for the secondaries in the years that were surveyed. In the Stan program below, this subset of y is used to select the observations to extract the associated probabilities out of the observation log TPM.

Stan program

Below is the Stan program for the above model written with log probabilities. In many ways it is similar to the mark-recapture example, with the following noteworthy changes:

  • The site- and year-specific time intervals tau[i, t] between surveyed years are computed in the transformed data block. Additionally, a logical q is computed indicating whether or not sites were surveyed in each year.
  • All entries in the \(3 \times T\) log MPS lmps are initialised with the observation process. Note that to account for the replicate secondary surveys, the relevant log probabilities of each secondary are subsetted from the observation TPM for each possible ecological state. For example, consider a detection vector of site \(i\), year \(t\), and secondaries \(1:K_{i,t}\) of y[i, t, secs[i, t, 1:n_sec_it[i, t]]] = [2, 3, 1, 2], showing that snakes were only observed during secondary 1, 2, and 4, and that a gravid female was only observed on the second occasion. The detection vector corresponding to ecological state 3 (present and breeding) would be \(\mathrm{TPM}_{3,[2, 3, 1, 3]} = \left[ p_2 \times (1 - \delta), p_2 \times \delta, p_2 \times (1 - \delta), 1 - p_2 \right]^\intercal\). The marginal probability of the observations is the product of these probabilities, or the sum of the log probabilities. Note that the detection vectors corresponding to the other states will be 0, because it’s impossible to observe breeding snakes for both unoccupied sites and sites occupied by non-breeders.
  • Because the TPM varies by site and year due to the unequal time intervals tau[i, t], the log TPM of the ecological process gets overwritten in each site- and year-level loop in the model block. Note that in the generated quantities block a different approach is used, where the same yearly TRM is used but lmps is only incremented with observation log probabilities in years that are surveyed for each site.
  • As in Example 1, the log likelihood and Viterbi algorithm are performed in the generated quantities block, but the Viterbi algorithm has been modified for this specific model.
Code
functions{
  /**
   * Return the natural logarithm of the product of the element-wise exponentiation of the specified matrices
   *
   * @param a  First matrix or (row_)vector
   * @param b  Second matrix or (row_)vector
   *
   * @return   log(exp(a) * exp(b))
   */
  matrix log_product_exp(matrix a, matrix b) {
    int x = rows(a);
    int y = cols(b);
    int z = cols(a);
    matrix[z, x] a_tr = a';
    matrix[x, y] c;
    for (j in 1:y) {
      for (i in 1:x) {
        c[i, j] = log_sum_exp(a_tr[:, i] + b[:, j]);
      }
    }
    return c;
  }
  vector log_product_exp(matrix a, vector b) {
    int x = rows(a);
    int z = cols(a);
    matrix[z, x] a_tr = a';
    vector[x] c;
    for (i in 1:x) {
      c[i] = log_sum_exp(a_tr[:, i] + b);
    }
    return c;
  }
  row_vector log_product_exp(row_vector a, matrix b) {
    int y = cols(b);
    vector[size(a)] a_tr = a';
    row_vector[y] c;
    for (j in 1:y) {
      c[j] = log_sum_exp(a_tr + b[:, j]);
    }
    return c;
  }
  real log_product_exp(row_vector a, vector b) {
    real c = log_sum_exp(a' + b);
    return c;
  }
}

data {
  int<lower=1> n_site, n_year, n_sec;
  array[n_site] int<lower=1, upper=n_year> n_year_i;
  array[n_site, n_year] int<lower=1, upper=n_year> years;
  array[n_site, n_year] int<lower=1, upper=n_sec> n_sec_it;
  array[n_site, n_year, n_sec] int<lower=1, upper=n_sec> secs;
  array[n_site, n_year, n_sec] int<lower=1, upper=3> y;
}

transformed data {
  array[n_site, n_year - 1] int tau;
  array[n_site, n_year] int<lower=0, upper=1> q = rep_array(0, n_site, n_year);
  for (i in 1:n_site) {
    for (t in 2:n_year_i[i]) {
      tau[i, t - 1] = years[i, t] - years[i, t - 1];
    }
    // surveys in that year?
    for (t in 1:n_year) {
      for (t_i in years[i, 1:n_year_i[i]]) {
        if (t == t_i) {
          q[i, t] = 1;
        }
      } // t_i
    } // t
  } // i
}

parameters {
  simplex[3] eta;
  vector<lower=0>[2] gamma, epsilon, psi;
  vector<lower=0, upper=1>[2] p;
  real<lower=0, upper=1> delta;
}

model {
  // TRM and log TPM containers
  matrix[3, 3] trm, ltpm_z, ltpm_y;
  
  // pre-compute log initial state vector
  vector[3] log_eta = log(eta);
  
  // ecological TRM (transpose once)
  trm[1, 1] = -(gamma[1] + gamma[2]);
  trm[1, 2] = gamma[1];
  trm[1, 3] = gamma[2];
  trm[2, 1] = epsilon[1];
  trm[2, 2] = -(epsilon[1] + psi[1]);
  trm[2, 3] = psi[1];
  trm[3, 1] = epsilon[2];
  trm[3, 2] = psi[2];
  trm[3, 3] = -(epsilon[2] + psi[2]);
  trm = trm';
  
  // pre-compute log detection probabilities
  vector[2] log_p = log(p);
  vector[2] log1m_p = log1m(p);
  
  // observation log TPM
  ltpm_y[1, 1] = 0;
  ltpm_y[1, 2] = negative_infinity();
  ltpm_y[1, 3] = negative_infinity();
  ltpm_y[2, 1] = log1m_p[1];
  ltpm_y[2, 2] = log_p[1];
  ltpm_y[2, 3] = negative_infinity();
  ltpm_y[3, 1] = log1m_p[2];
  ltpm_y[3, 2] = log_p[2] + log1m(delta);
  ltpm_y[3, 3] = log_p[2] + log(delta);
  
  // marginal log probabilities per state
  matrix[3, n_year] lmps;
  int tt;
  
  // likelihood
  for (i in 1:n_site) {
    
    // initialise marginal log probabilities with observation model in each year
    for (t in 1:n_year_i[i]) {
      tt = years[i, t];  // map to correct year
      for (s in 1:3) {
        lmps[s, t] = sum(ltpm_y[s, y[i, tt, secs[i, tt, 1:n_sec_it[i, tt]]]]);
      } // s
    } // t
    
    // increment initial state log probabilities
    lmps[:, 1] += log_eta;
    
    // for subsequent years
    for (t in 2:n_year_i[i]) {
      
      // compute log TPM from TRM and tau
      ltpm_z = log(matrix_exp(trm * tau[i, t - 1]));
      
      // increment marginal log detection probabilities with ecological process
      lmps[:, t] += log_product_exp(ltpm_z, lmps[:, t - 1]);
    }
    
    // increment log density
    target += log_sum_exp(lmps[:, n_year_i[i]]);
    
  } // i
  
  // priors
  target += dirichlet_lupdf(eta | ones_vector(3));
  target += exponential_lupdf(gamma | 1);
  target += exponential_lupdf(epsilon | 1);
  target += exponential_lupdf(psi | 1);
  target += beta_lupdf(p | 1, 1);
  target += beta_lupdf(delta | 1, 1);
}

generated quantities {
  array[n_site, n_year] int z;
  vector[n_site] log_lik;
  {
    matrix[3, 3] trm, ltpm_z, ltpm_y;
    vector[3] log_eta = log(eta);
    trm[1, 1] = -(gamma[1] + gamma[2]);
    trm[1, 2] = gamma[1];
    trm[1, 3] = gamma[2];
    trm[2, 1] = epsilon[1];
    trm[2, 2] = -(epsilon[1] + psi[1]);
    trm[2, 3] = psi[1];
    trm[3, 1] = epsilon[2];
    trm[3, 2] = psi[2];
    trm[3, 3] = -(epsilon[2] + psi[2]);
    trm = trm';
    ltpm_z = matrix_exp(trm); // only one needed
    vector[2] log_p = log(p);
    vector[2] log1m_p = log1m(p);
    ltpm_y[1, 1] = 0;
    ltpm_y[1, 2] = negative_infinity();
    ltpm_y[1, 3] = negative_infinity();
    ltpm_y[2, 1] = log1m_p[1];
    ltpm_y[2, 2] = log_p[1];
    ltpm_y[2, 3] = negative_infinity();
    ltpm_y[3, 1] = log1m_p[2];
    ltpm_y[3, 2] = log_p[2] + log1m(delta);
    ltpm_y[3, 3] = log_p[2] + log(delta);
    array[3, n_year] int back_ptr;
    matrix[3, n_year] lmps, best_lp;
    real tmp;
    int tt;
    
    for (i in 1:n_site) {
      
      // initialise probabilities
      lmps = rep_matrix(0, 3, n_year);
      best_lp = rep_matrix(negative_infinity(), 3, n_year);
      
      // first year and observation log probabilities
      lmps[:, 1] = log_eta;
      for (t in 1:n_year) {
        if (q[i, t]) {
          for (s in 1:3) {
            lmps[s, t] += sum(ltpm_y[s, y[i, t, secs[i, t, 1:n_sec_it[i, t]]]]);
          } // s
        }
      } // t
      
      // Viterbi
      best_lp[:, 1] = lmps[:, 1];
      for (t in 2:n_year) {
        for (s_a in 1:3) {  // state of arrival
          for (s_d in 1:3) { // state of departure
            if (q[i, t]) {
              tmp = best_lp[s_d, t - 1] + ltpm_z[s_a, s_d] + lmps[s_a, t];
            } else {
              // ignore observation process if not surveyed
              tmp = best_lp[s_d, t - 1] + ltpm_z[s_a, s_d];
            }
            if (tmp > best_lp[s_a, t]) {
              back_ptr[s_a, t] = s_d;
              best_lp[s_a, t] = tmp;
            }
          } // s_d
        } // s_a
        
        // increment with ecological process for log-lik
        lmps[:, t] += log_product_exp(ltpm_z, lmps[:, t - 1]);
      } // t
      log_lik[i] += log_sum_exp(lmps[:, n_year_i[i]]);
      
      // ecological states
      tmp = max(best_lp[:, n_year]);
      for (s_a in 1:3) {
        if (best_lp[s_a, n_year] == tmp) {
          z[i, n_year] = s_a;
        }
      } // s_a
      for (t in 1:(n_year - 1)) {
        tt = n_year - t + 1;
        z[i, tt - 1] = back_ptr[z[i, tt], tt];
      } // t
    } // i
  }
}

We’ll run Stan and visually summarise the parameter estimates, with the input plotted alongside in green. It looks like we did alright, but there is a lot of uncertainty, largely because sites can move between 3 states between each year. For this reason, dynamic occupancy models are notoriously data-hungry. Often the initial state parameters are particularly difficult to estimate and are sometimes best parameterised as the steady state vector of the TPM. I haven’t quite worked out how to implement this efficiently in Stan.

Code
# data for Stan
occ_data <- list(n_site = n_site, n_year = n_year, n_sec = n_sec,
                 n_year_i = n_year_i, years = years, n_sec_it = n_sec_it,
                 secs = secs, y = y) |> 
  sapply(\(x) replace(x, is.na(x), 1))

# run HMC
fit_dyn <- occ_dyn_ms$sample(data = occ_data, refresh = 0, init = 0.1,
                             chains = n_chains, parallel_chains = n_chains, iter_sampling = n_iter)
Running MCMC with 4 parallel chains...

Chain 4 finished in 211.4 seconds.
Chain 2 finished in 211.6 seconds.
Chain 1 finished in 221.6 seconds.
Chain 3 finished in 227.2 seconds.

All 4 chains finished successfully.
Mean chain execution time: 217.9 seconds.
Total execution time: 227.4 seconds.

Results

Code
# visualise estimates
fit_dyn$summary(c("eta", "gamma", "epsilon", "psi", "p", "delta")) |>
  mutate(truth = c(eta, gamma, epsilon, psi, p, delta),
         parameter = c(rep("eta", 3), 
                       rep(c("gamma", "epsilon", "psi", "p"), each = 2), 
                       "delta") |>
           factor(levels = c("eta", "gamma", "epsilon", "psi", "p", "delta")),
         variable = factor(variable) |> fct_reorder(as.numeric(parameter)),
         process = c(rep("Ecological Process", 3 + 3 * 2), 
                     rep("Observation Process", 3)) |>
           factor()) |>
  ggplot(aes(median, fct_rev(variable))) +
  facet_wrap(~ process, scales = "free_y", ncol = 1) +
  geom_pointrange(aes(xmin = q5, xmax = q95), position = position_nudge(y = 1/10)) +
  geom_point(aes(truth), colour = green, position = position_nudge(y = -1/10)) +
  scale_x_continuous(breaks = seq(0.2, 1, 0.2), expand = c(0, 0)) +
  scale_y_discrete(labels = ggplot2:::parse_safe) +
  labs(x = "Posterior", y = "Parameter")

Let’s also check how many of our ecological ecological states were correctly recovered. Since only state 3 (snakes present and breeding) can be directly observed with certainty, we’ll only check the ecological states for those times where no breeding was observed. We certainly didn’t do as well as in the mark-recapture example, which isn’t too surprising with a small number of individuals and the complexity of the model.

Code
# maximum observed state
z_obs <- apply(y, 1:2, \(y) ifelse(sum(is.na(y)) == n_sec, 
                                   NA, max(y, na.rm = T)))

# estimated ecological states
z_est <- fit_dyn$summary("z")$median |>
  matrix(n_site, n_year)

# proportion of ecological states correctly estimated
unknowns <- z_obs < 3 | is.na(z_obs)
sum((z_est == z)[unknowns]) / sum(unknowns)
[1] 0.688
Steady state distribution

ChatGPT tells me that to get the steady state distribution from a TPM, you perform the eigendecomposition of the TPM and the eigenvector corresponding to the eigenvalue of 1 is the steady state distribution. Stan does have eigendecomposition, so I’m sure there’s a way to parameterise the initial states nicely, but it gets more complicated with time- and/or individual-varying parameters.

Example 3: Disease-structured mark-recapture with state misclassification

The main reason I dove so deep into HMMs is because during my PhD I ran into a problem with my mark-recapture study on Fleay’s barred frogs (Mixophyes fleayi). For two years I conducted mark-recapture surveys along Gondwana rainforest streams in northern New South Wales, Australia, swabbing every frog I found to test for the presence of the pathogenic amphibian chytrid fungus (Batrachochytrium dendrobatidis, Bd) (Scheele et al. 2019) to hopefully infer something about (1) the effect of Bd on frog mortality rates and (2) the infection dynamics.

Adult male Fleay’s or silverblue-eyed barred frog (Mixophyes fleayi) from Border Ranges National Park, New South Wales, Australia.

The ecological TRM of such a disease-structured multistate model is pretty straightforward, with three possible states: (1) alive and uninfected, (2) alive and infected, and (3) dead, which is an absorbing state:

\[ \mathrm{TRM}_z = \begin{bmatrix} -(\psi_1 + \phi_1) & \psi_1 & \phi_1 \\ \psi_2 & -(\psi_2 + \phi_2) & \phi_2 \\ 0 & 0 & 0 \end{bmatrix} \tag{8}\]

Here, \(\phi_{1:2}\) are the mortality hazard rates of uninfected and infected frogs, respectively, and \(\psi_{1:2}\) are the rates of gaining and clearing infections, respectively.

Nothing is certain

While I was running qPCRs in the lab on some swabs collected during high frequency surveys (1 week intervals) with large numbers of recaptures, I noticed some frogs were apparently frequently clearing and re-gaining infections. For instance, a frog captured on 5 successive weeks may have looked something like this: \(\left[ 2, 2, 1, 2, 1 \right]\), implying the infection was lost twice and gained once across 4 weeks. This seemed unlikely, and it struck me as more probable that I was simply getting false negatives. At the same time, I couldn’t rule out false positives, either, given that contamination is always a risk. The best way to account for this uncertainty is to model it.

I realised that if you conduct robust design sampling and collect a swab with each capture, you could essentially model the swabbing process like an occupancy model within a mark-recapture model. That is, imagine surveying frogs \(i \in 1:I\) during \(t \in 1:T\) primary occasions (say every 2 months). During each primary occasion, you conduct multiple secondary surveys \(k \in 1:K\) within a short amount of time where you’re willing to assume the population is closed, that is, no animals dying, coming in, changing their infection state, etc. Every time you capture a frog, including those captured multiple times within a primary, you collect a swab. There are 3 possible observed states (\(o\)) per secondary: (1) frog captured with swab Bd–, (2) frog captured with swab Bd+, and (3) frog not captured. The TPM of this observation process for repeat secondaries with false negatives and false positives is as follows, with ecological states in the rows and observed states in the columns:

\[ \mathrm{TPM}_o = \begin{bmatrix} p_1 \times ( 1 - \lambda_1) & p_2 \times \lambda_1 & 1 - p_1 \\ p_2 \times (1 - \delta_1) & p_2 \times \delta_1 & 1 - p_2 \\ 0 & 0 & 1 \end{bmatrix} \tag{9}\] Where \(p_{1:2}\) are the detection probabilities of uninfected and infected frogs, respectively, \(\lambda_1\) is the probability of getting a Bd+ swab from an uninfected frog (false positive), and \(\delta_2\) is the probability of detecting Bd on a swab from an infected frog (where \(1 - \delta_1\) is the false negative probability). The Bd detection parameters \(\delta_1\) and \(\lambda_1\) are identified if at least some frogs are recaptured multiple times within a primary occasion.

Double-decking the HMM

This is already a lot, but it’s really no different from Example 2: a continuous time ecological process, simpler this time because the dead state is absorbing, and a discrete time observation process, albeit with an extra false positive parameter. However, the reality of the data was slightly more complex. It is conventional to run qPCR several times per swab to accurately quantify the DNA load present in a sample. But this is analogous to the swabbing process described above. Presumably, it’s possible to get false negatives and false positives in the qPCR diagnostic process as well. There are thus three diagnostic states (\(y\), data): (1) qPCR replicate Bd–, (2) qPCR replicate Bd+, or (3) no qPCR performed, because no frog was captured and thus no swab was collected. Typically, people will just average their measured DNA loads across runs and apply an inclusion criterion, such as that samples are considered infected when 2/3 qPCR replicates return Bd DNA. But we can do better, we can just model it. Here’s the TPM of the diagnostic process, with observed states in the rows and diagnostic states in the columns:

\[ \mathrm{TPM}_y = \begin{bmatrix} 1 - \lambda_2 & \lambda_2 & 0 \\ 1 - \delta_2 & \delta_2 & 0 \\ 0 & 0 & 1 \end{bmatrix} \tag{10}\] where \(\lambda_2\) is another false positive parameter and \(\delta_2\) is the detection probability in the qPCR process.

Always try to use all of the data

Another element of this model involved incorporating infection intensity into the model. qPCR doesn’t just tell you whether a sample is infected, but it tells you how many DNA copies are present in the sample. So with multiple qPCR runs (\(x\)) \(l \in 1:L\) collected from multiple samples (\(n\)), we can actually estimate the latent infection intensity on the individual \(m\) and samples:

\[ \begin{aligned} m_{i,t} &\sim \mathrm{Lognormal} \left( \mu, \sigma_1 \right) \\ n_{i,t,k} &\sim \mathrm{Lognormal} \left( m_{i,t}, \sigma_2 \right) \\ x_{i,t,k,l} &\sim \mathrm{Lognormal} \left( n_{i,t,k}, \sigma_3 \right) \end{aligned} \tag{11}\] where lognormals are used to ensure the infection intensities are positive, \(\mu\) is the log population average and \(\sigma_{1:3}\) are the population standard deviation and the errors of the swabbing and qPCR processes, respectively. We can incorporate the infection intensities to the ecological and observation models by modeling \(\phi_2\), the mortality rates of infected frogs, and \(\delta_{1:2}\), the Bd detection probabilities, as functions of the relevant infection intensities, for example like this:

\[ \begin{aligned} {\phi_2}_{i, t} &= \exp \left( \log {\phi_2}_\alpha + {\phi_2}_\beta \times m_{i,t} \right) \\ {\delta_1}_{i,t} &= 1 - (1 - r_1)^{m_{i,t}} \\ {\delta_2}_{i,t,k} &= 1 - (1 - r_2)^{n_{i,t,k}} \end{aligned} \tag{12}\] Here, the first equation is a simple GLM of the mortality hazard rate as a function of time-varying individual infection intensity, and the Bd detection probabilities are modeled as a function of \(r_{1:2}\), or the probabilities of detecting one log gene copy of DNA on a swab or qPCR replicate.

Dr. Andy Royle and myself published this model in Methods in Ecology and Evolution where we initially implemented the model in NIMBLE (Hollanders and Royle 2022). But of course, I was keen to try to marginalise out the latent ecological and observed states to fit this model in Stan (note that the ecological states are completely unobserved, as there’s no way of knowing whether an infected qPCR run actually came from an infected sample or frog!).

Stan implementation

This model took me a lot of time to get going in Stan, but I got there in the end. I won’t go into detail about explaining everything, but I’ll mention a few key things here:

  • I implemented this model with threading (within-chain parallelisation), where the work from the different HMC chains gets partitioned across (potentially) all of the cores on your computer. With this particular individual-level likelihood, this should have considerable speed benefits. Basically, I put as much of the model block inside the partial_sum_lpmf() function which then gets called by reduce_sum(). See the Stan User’s Guide for more information. In this example, a lot of things could have been pre-computed instead of within each individual loop, but the current formulation is ready more more individual-level complexity.
  • One thing I got hung up on for a while was how the observation and diagnostic processes should be coded. The trick is to start with the diagnostic process, that is, the lowest level of the hierarchy that is directly conditioned on the data (as we have been doing). For each secondary survey \(k\) in primary \(t\) that an individual was captured (using the indicator q created in the transformed data block), you sum the diagnostic probability vectors associated with the observed data (diagnostic states) for each possible (partially latent) observed state. Because these probabilities differ for each observed state, we need another object to hold the intermediate quantities, which I’ve called omega. The diagnostic log probability vectors are then (log) matrix multiplied with the observation (log) TPM. If the frog was not captured during that secondary, lmps is simply incremented with the observation TPM probabilities associated with not being detected. From here, lmps is incremented with the ecological process as in the previous examples.
  • Because we only use the rows of the observation and diagnostic TPMs that are associated with alive and observed frogs, respectively, I don’t write out the full TPMs in the code.
  • Stan has some issue related to evaluating the gradient of 0 * log(0). I don’t know what that means exactly, but I know I was having errors with the model unless I formulated the model in a particular way (I think that’s why I had to subset \(\mathrm{TPM}_z\) (ltpm_z) to not include the dead state, but I’m not sure). Specifically, as in Example 3, I only increment lmps with the alive states until the last primary of capture. After that occasion, the dead state is incremented for the first time. For subsequent occasions, the dead state is incremented separately from the alive states. In the generated quantities for the Viterbi algorithm, I do need the full ltpm_z, as I’m interested in propagating the probabilities of the dead state.
  • The inclusion of false positives means the observation and diagnostic processes are examples of finite mixtures that may have multimodality in certain model constructions (Royle and Link 2006). Here, the true pathogen detection probabilities \(\delta\) are modeled as a function of infection intensity, which may be enough to avoid the multimodality. Just to be sure, however, I’ve constrained \(\lambda_{1:2}\) to be less than \(r_{1:2}\) in the program and further given fairly strong \(\mathrm{Beta} (1, 10)\) priors for \(\lambda\).
  • The individual infection intensities m[i, t] are parameterised as centered lognormal distribution and the sample infection intensities n[i, t, k] as non-centered lognormal, as this seemed to give the best HMC performance.

Simulation

Below I simulate some data where the rates of gaining and clearing infection are affected by temperature (negative and positive, respectively), which seems to be the case for the frog-Bd system. To account for the varying infection dynamics in the state probabilities at first capture, I’ll use the steady state distribution for each primary which is computed using the infection dynamics probabilities, given by \({\psi_p}_{1:2} = 1 - \exp(-\psi_{1:2})\). Then, the expected probability of being infected at first capture in each primary is \(\eta = \frac{{\psi_p}_1}{{\psi_p}_1 + {\psi_p}_2}\).

Code
# metadata
n_ind <- 100
n_prim <- 8
n_sec <- 3
n_diag <- 3
tau <- rlnorm(n_prim - 1, log(1), 0.5)             # unequal primary occasion intervals
temp <- rnorm(n_prim - 1) |> sort(decreasing = T)  # it's getting colder

# parameters
phi_a <- c(0.1, 0.1)       # mortality rates of uninfected/infected frogs (intercepts)
phi_b <- 0.3               # effect of one log Bd gene copy on mortality
psi_a <- c(0.7, 0.4)       # rates of gaining/clearing infections (intercepts)
psi_b <- c(-0.4, 0.3)      # effect of temperature on rates of gaining/clearing infections
p_a <- c(0.7, 0.6)         # detection probabilities of uninfected/infected frogs
r <- c(0.4, 0.6)           # pathogen detection probabilities (per log Bd gene copy)
lambda <- c(0.05, 0.10)    # sampling/diagnostic false positive probabilities
mu <- 1.0                  # population log average infection intensity
sigma <- c(0.3, 0.2, 0.1)  # population and sampling/diagnostic SDs

# TPM containers
trm_z <- tpm_z <- array(NA, c(3, 3, n_ind, n_prim - 1))
tpm_o <- array(NA, c(3, 3, n_ind, n_prim, n_sec))
tpm_y <- array(NA, c(3, 3, n_ind, n_prim, n_sec))

# parameter containers
phi <- array(phi_a, c(2, n_ind, n_prim - 1))
psi <- exp(matrix(log(psi_a), 2, n_prim - 1) + matrix(psi_b, 2, n_prim - 1) * matrix(temp, 2, n_prim - 1, byrow = T))
psi_p <- 1 - exp(-psi)
eta <- psi_p[1, ] / apply(psi_p, 2, sum)
p <- array(p_a, c(2, n_ind, n_prim, n_sec))
m <- delta1 <- array(NA, c(n_ind, n_prim))
n <- delta2 <- array(NA, c(n_ind, n_prim, n_sec))

# primary occasions of first capture
first <- sort(sample(1:(n_prim - 1), n_ind, replace = T))

# TPMs
for (i in 1:n_ind) {
  
  # fix detection probabilities in first secondary of first primary to ensure capture
  p[, i, first[i], 1] <- 1
  
  for (t in first[i]:n_prim) {
    
    # individual infection intensity
    m[i, t] <- rlnorm(1, mu, sigma[1])
    
    # sample pathogen detection
    delta1[i, t] <- 1 - (1 - r[1]) ^ m[i, t]
    
    # sample infection intensities
    n[i, t, ] <- rlnorm(n_sec, log(m[i, t]), sigma[2])
    
    # diagnostic pathogen detection
    delta2[i, t, ] <- 1 - (1 - r[2]) ^ n[i, t, ]
    
    for (k in 1:n_sec) {
      
      # observation TPM
      tpm_o[, , i, t, k] <- matrix(c(p[1, i, t, k] * (1 - lambda[1]), p[1, i, t, k] * lambda[1], 1 - p[1, i, t, k],
                                     p[2, i, t, k] * (1 - delta1[i, t]), p[2, i, t, k] * delta1[i, t], 1 - p[2, i, t, k], 
                                     0, 0, 1), 
                                   3, byrow = T)
      
      # diagnostic TPM
      tpm_y[, , i, t, k] <- matrix(c(1 - lambda[2], lambda[2], 0, 
                                     1 - delta2[i, t, k], delta2[i, t, k], 0, 
                                     0, 0, 1), 
                                   3, byrow = T)

    } # k
  } # t
  
  for (t in first[i]:(n_prim - 1)) {
    
    # mortality rate as a function of infection intensity
    phi[2, i, t] <- exp(log(phi_a[2]) + phi_b * m[i, t])
    
    # ecological TRM/TPM
    trm_z[, , i, t] <- matrix(c(-(psi[1, t] + phi[1, i, t]), psi[1, t], phi[1, i, t], 
                                psi[2, t], -(psi[2, t] + phi[2, i, t]), phi[2, i, t], 
                                0, 0, 0), 
                              3, byrow = T)
    tpm_z[, , i, t] <- expm(trm_z[, , i, t] * tau[t])
    
  } # t
} # i

# containers
z <- array(NA, c(n_ind, n_prim))
o <- array(NA, c(n_ind, n_prim, n_sec))
y <- x <- array(NA, c(n_ind, n_prim, n_sec, n_diag))

# simulation
for (i in 1:n_ind) {
  
  # ecological process
  z[i, first[i]] <- rcat(1, c(1 - eta[first[i]], eta[first[i]]))
  for (t in (first[i] + 1):n_prim) {
    z[i, t] <- rcat(1, tpm_z[z[i, t - 1], , i, t - 1])
  } # t
  
  for (t in first[i]:n_prim) {
    for (k in 1:n_sec) {
      
      # observation process
      o[i, t, k] <- rcat(1, tpm_o[z[i, t], , i, t, k])
      
      # diagnostic process
      y[i, t, k, ] <- rcat(n_diag, tpm_y[o[i, t, k], , i, t, k])
      
      # diagnostic run infection intensity
      for (l in 1:n_diag) {
        if (y[i, t, k, l] == 2) {
          x[i, t, k, l] <- rlnorm(1, log(n[i, t, k]), sigma[3])
        }
      } # l
    } # k
  } # t
} # i

Stan program

Here’s the Stan program.

Code
functions {
  /**
   * Return partial sum over individuals applying the forward algorithm for multievent mark-recapture model formulated as hidden Markov model
   *
   * @param seq_ind    Sequence of individuals from 1 to n_ind
   * @param start      Start of seq_ind for partial sum
   * @param end        End of seq_ind for partial sum
   * @param y          Detection history
   * @param q          Logical indicating whether individual was detected in each secondary
   * @param n_prim     Number of primary occasions
   * @param n_sec      Number of secondary occasions per primary
   * @param first      Primary occasion of first capture per individual
   * @param last       Primary occasion of last capture per individual
   * @param first_sec  Secondary of first capture in primary of first capture
   * @param tau        Time intervals between primary occasions
   * @param temp       Average temperatures between primary occasions
   * @param phi_a      Mortality rates of uninfected and infected individuals (intercepts)
   * @param phi_b      Effect of one log gene copy of pathogen on mortality rate
   * @param psi_a      Rates of gaining and clearing infections (intercepts)
   * @param psi_b      Effects of temperature on rates of gaining and clearing infections
   * @param p_a        Detection probabilities of uninfected and infected individuals
   * @param r          Probabilities of detecting one log gene copy of pathogen on individuals (sampling process) and samples (diagnostic process)
   * @param lambda     False positive probabilities in sampling and diagnostic processes
   * @param m          Individual infection intensities per primary
   * @param n_z        z-scores of sample infection intensities per secondary
   * @param sigma      Vector of infection intensity SDs
   *
   * @return           Log probability for subset of individuals
   */
  real partial_sum_lpmf(
    data array[] int seq_ind, data int start, data int end, data array[,,,] int y, data array[,,] int q, 
    data int n_prim, data int n_sec, data array[] int first, data array[] int last, data array[] int first_sec, data vector tau, data vector temp, 
    vector phi_a, real phi_b, vector psi_a, vector psi_b, vector p_a, vector r, vector lambda, matrix m, array[] matrix n_z, vector sigma) {
      
      // number of individuals in partial sum and initialise partial target
      int n_ind = end - start + 1;
      real ptarget = 0;
      
      // containers
      array[2] vector[n_prim - 1] phi, psi, psi_p;
      vector[n_prim - 1] eta, log_eta, log1m_eta;
      array[2] matrix[n_sec, n_prim] p;
      matrix[n_ind, n_prim] log_m = log(m[:, start:end])';
      array[n_ind] matrix[n_sec, n_prim] n;
      tuple(vector[n_prim], matrix[n_sec, n_prim]) delta;
      
      // transform intercepts
      vector[2] log_phi_a = log(phi_a);
      vector[2] log_psi_a = log(psi_a);
      
      // transition rate/log probability matrices (TRM/LTPM)
      array[n_prim - 1] matrix[3, 3] trm;
      array[n_prim - 1] matrix[3, 2] ltpm_z;
      array[n_prim, n_sec] matrix[2, 3] ltpm_o;
      array[n_prim, n_sec] matrix[2, 2] ltpm_y;
      
      // log probabilities
      array[n_prim] matrix[2, n_sec] omega;
      matrix[3, n_prim] lmps;
      
      // for each individual
      for (i in 1:n_ind) {
        
        // index that maps to original n_ind
        int ii = i + start - 1;
        
        // ecological process parameters
        phi[1] = rep_vector(phi_a[1], n_prim - 1);
        phi[2] = exp(log_phi_a[2] + phi_b * m[1:(n_prim - 1), ii]);
        for (s in 1:2) {
          psi[s] = exp(log_psi_a[s] + psi_b[s] * temp);
          psi_p[s] = 1 - exp(-psi[s]);
        }
        eta = psi_p[1] ./ (psi_p[1] + psi_p[2]);
        log_eta = log(eta);
        log1m_eta = log1m(eta);
        
        // primary occasion intervals
        for (t in first[ii]:(n_prim - 1)) {
          
          // ecological TRM
          trm[t][1, 1] = -(psi[1][t] + phi[1][t]);
          trm[t][1, 2] = psi[1][t];
          trm[t][1, 3] = phi[1][t];
          trm[t][2, 1] = psi[2][t];
          trm[t][2, 2] = -(psi[2][t] + phi[2][t]);
          trm[t][2, 3] = phi[2][t];
          trm[t][3] = zeros_row_vector(3);
          trm[t] = trm[t]';
          
          // ecological log TPM
          ltpm_z[t] = log(matrix_exp(trm[t] * tau[t])[:, 1:2]);
          
        } // t
        
        // sample infection intensities
        ptarget += std_normal_lupdf(to_vector(n_z[ii]));
        n[i] = exp(rep_matrix(log_m[i], n_sec) + n_z[ii] * sigma[2]);
        
        // individual and sample infection detection probabilities
        delta.1 = 1 - pow(1 - r[1], m[:, ii]);
        delta.2 = 1 - pow(1 - r[2], n[i]);
        
        // detection probabilities (fixed at 1 for secondary of first capture)
        for (s in 1:2) {
          p[s] = rep_matrix(p_a[s], n_sec, n_prim);
          p[s][first_sec[ii], first[ii]] = 1;
        } // s
        
        // initialise log MPS
        lmps = rep_matrix(0, 3, n_prim);
        
        // for each primary occasion
        for (t in first[ii]:n_prim) {
          
          // for each secondary occasion
          for (k in 1:n_sec) {
            
            // observation log TPM
            ltpm_o[t, k][1, 1] = log(p[1][k, t]) + log1m(lambda[1]);
            ltpm_o[t, k][1, 2] = log(p[1][k, t]) + log(lambda[1]);
            ltpm_o[t, k][1, 3] = log1m(p[1][k, t]);
            ltpm_o[t, k][2, 1] = log(p[2][k, t]) + log1m(delta.1[t]);
            ltpm_o[t, k][2, 2] = log(p[2][k, t]) + log(delta.1[t]);
            ltpm_o[t, k][2, 3] = log1m(p[2][k, t]);
            
            // diagnostic log TPM
            ltpm_y[t, k][1, 1] = log1m(lambda[2]);
            ltpm_y[t, k][1, 2] = log(lambda[2]);
            ltpm_y[t, k][2, 1] = log1m(delta.2[k, t]);
            ltpm_y[t, k][2, 2] = log(delta.2[k, t]);
            
            // if captured during secondary
            if (q[ii, t, k]) {
              
              // diagnostic probabilities (y[ii, t, k] is an array of n_diag integers)
              for (o in 1:2) {
                omega[t][o, k] = sum(ltpm_y[t, k][o, y[ii, t, k]]);
              }
              
              // accrue log marginal probabilities with observation process
              lmps[1:2, t] += log_product_exp(ltpm_o[t, k][:, 1:2], omega[t][:, k]);
              
              // if not captured, observation process only
            } else {
              lmps[1:2, t] += ltpm_o[t, k][:, 3];
            }
          } // k
        } // t
        
        // marginal log probabilities of alive states at first capture
        lmps[1:2, first[ii]] += [ log1m_eta[first[i]], log_eta[first[i]] ]';
        
        // marginal log probabilities of alive states between first and last primary with captures
        if (first[ii] < last[ii]) {
          for (t in (first[ii] + 1):last[ii]) {
            lmps[1:2, t] += log_product_exp(ltpm_z[t - 1][1:2], lmps[1:2, t - 1]);
          } // t
        }
        
        // if last captured in the last primary
        if (last[ii] == n_prim) {
          
          // increment target with marginal log probabilities of alive states
          ptarget += log_sum_exp(lmps[1:2, n_prim]);
        } else {
          
          // marginal log probabilities of all ecological states in primary after last capture
          lmps[1:2, last[ii] + 1] += log_product_exp(ltpm_z[last[ii]][1:2], lmps[1:2, last[ii]]);
          lmps[3, last[ii] + 1] = log_product_exp(ltpm_z[last[ii]][3], lmps[1:2, last[ii]]);
          
          // if there are more primaries
          if ((last[ii] + 1) < n_prim) {
            
            // marginal log probabilities of all ecological states until last primary
            for (t in (last[ii] + 2):n_prim) {
              lmps[1:2, t] += log_product_exp(ltpm_z[t - 1][1:2], lmps[1:2, t - 1]);
              lmps[3, t] = log_sum_exp(log_product_exp(ltpm_z[t - 1][3], lmps[1:2, t - 1]), lmps[3, t - 1]);
            } // t
          }
          
          // increment target with marginal log probabilities of all ecological states
          ptarget += log_sum_exp(lmps[:, n_prim]);
        }
      } // i
      
      return ptarget;
    }
    
  /**
   * Return the natural logarithm of the product of the element-wise exponentiation of the specified matrices
   *
   * @param a  First matrix or (row_)vector
   * @param b  Second matrix or (row_)vector
   *
   * @return   log(exp(a) * exp(b))
   */
  matrix log_product_exp(matrix a, matrix b) {
    int x = rows(a);
    int y = cols(b);
    int z = cols(a);
    matrix[z, x] a_tr = a';
    matrix[x, y] c;
    for (j in 1:y) {
      for (i in 1:x) {
        c[i, j] = log_sum_exp(a_tr[:, i] + b[:, j]);
      }
    }
    return c;
  }
  vector log_product_exp(matrix a, vector b) {
    int x = rows(a);
    int z = cols(a);
    matrix[z, x] a_tr = a';
    vector[x] c;
    for (i in 1:x) {
      c[i] = log_sum_exp(a_tr[:, i] + b);
    }
    return c;
  }
  row_vector log_product_exp(row_vector a, matrix b) {
    int y = cols(b);
    vector[size(a)] a_tr = a';
    row_vector[y] c;
    for (j in 1:y) {
      c[j] = log_sum_exp(a_tr + b[:, j]);
    }
    return c;
  }
  real log_product_exp(row_vector a, vector b) {
    real c = log_sum_exp(a' + b);
    return c;
  }
}

data {
  int grainsize, n_ind, n_prim, n_sec, n_diag, n_x;
  array[n_ind] int<lower=1, upper=n_prim> first, last, first_sec;
  vector<lower=0>[n_prim - 1] tau;
  vector[n_prim - 1] temp;
  array[n_ind, n_prim, n_sec, n_diag] int<lower=1, upper=3> y;
  array[n_x] int<lower=1> ind, prim, sec;
  vector[n_x] x;
}

transformed data {
  array[n_ind] int seq_ind = linspaced_int_array(n_ind, 1, n_ind);
  array[n_ind, n_prim, n_sec] int<lower=0, upper=1> q = rep_array(0, n_ind, n_prim, n_sec);
  for (i in 1:n_ind) {
    for (t in first[i]:n_prim) {
      for (k in 1:n_sec) {
        if (min(y[i, t, k]) < 3) {
          q[i, t, k] = 1;
        }
      }
    }
  }
}

parameters {
  vector<lower=0>[2] phi_a, psi_a;
  real phi_b;
  vector[2] psi_b;
  vector<lower=0, upper=1>[2] p_a, r;
  vector<lower=0, upper=r>[2] lambda;
  real mu;
  matrix<lower=0>[n_prim, n_ind] m;
  array[n_ind] matrix[n_sec, n_prim] n_z;
  vector<lower=0>[3] sigma;            
}

model {
  // log sample infection intensities
  array[n_ind] matrix[n_sec, n_prim] log_n;
  matrix[n_ind, n_prim] log_m = log(m)';
  for (i in 1:n_ind) {
    log_n[i] = rep_matrix(log_m[i], n_sec) + n_z[i] * sigma[2];
  }
  
  // likelihood of diagnostic infection intensities
  for (j in 1:n_x) {
    target += lognormal_lupdf(x[j] | log_n[ind[j]][sec[j], prim[j]], sigma[3]);
  }
  
  // likelihood of CMR
  target += reduce_sum(partial_sum_lupmf, seq_ind, grainsize, y, q, 
                       n_prim, n_sec, first, last, first_sec, tau, temp, 
                       phi_a, phi_b, psi_a, psi_b, p_a, r, lambda, m, n_z, sigma);
                       
  // priors
  target += exponential_lupdf(phi_a | 1);
  target += std_normal_lupdf(phi_b);
  target += exponential_lupdf(psi_a | 1);
  target += std_normal_lupdf(psi_b);
  target += beta_lupdf(p_a | 1, 1);
  target += beta_lupdf(r | 1, 1);
  target += beta_lupdf(lambda | 1, 10);
  target += normal_lupdf(mu | 0, 1);
  target += exponential_lupdf(sigma | 1);
  target += lognormal_lupdf(to_vector(m) | mu, sigma[1]);
}

generated quantities {
  // containers
  array[n_ind, n_prim] int z;
  vector[n_ind] log_lik;
  {
    array[2] vector[n_prim - 1] phi, psi, psi_p;
    vector[n_prim - 1] eta, log_eta, log1m_eta;
    array[2] matrix[n_sec, n_prim] p;
    matrix[n_ind, n_prim] log_m = log(m)';
    array[n_ind] matrix[n_sec, n_prim] n;
    tuple(vector[n_prim], matrix[n_sec, n_prim]) delta;
    vector[2] log_phi_a = log(phi_a);
    vector[2] log_psi_a = log(psi_a);
    array[n_prim - 1] matrix[3, 3] trm;
    array[n_prim - 1] matrix[3, 3] ltpm_z;
    array[n_prim, n_sec] matrix[2, 3] ltpm_o;
    array[n_prim, n_sec] matrix[2, 2] ltpm_y;
    array[n_prim] matrix[2, n_sec] omega;
    array[3, n_prim] int back_ptr;
    matrix[3, n_prim] lmps, best_lp;
    real tmp;
    int tt;
    
    for (i in 1:n_ind) {
    
      phi[1] = rep_vector(phi_a[1], n_prim - 1);
      phi[2] = exp(log_phi_a[2] + phi_b * m[1:(n_prim - 1), i]);
      for (s in 1:2) {
        psi[s] = exp(log_psi_a[s] + psi_b[s] * temp);
        psi_p[s] = 1 - exp(-psi[s]);
      }
      eta = psi_p[1] ./ (psi_p[1] + psi_p[2]);
      log_eta = log(eta);
      log1m_eta = log1m(eta);
      for (t in first[i]:(n_prim - 1)) {
        trm[t][1, 1] = -(psi[1][t] + phi[1][t]);
        trm[t][1, 2] = psi[1][t];
        trm[t][1, 3] = phi[1][t];
        trm[t][2, 1] = psi[2][t];
        trm[t][2, 2] = -(psi[2][t] + phi[2][t]);
        trm[t][2, 3] = phi[2][t];
        trm[t][3] = zeros_row_vector(3);
        trm[t] = trm[t]';
        // dead state remains for Viterbi algorithm
        ltpm_z[t] = log(matrix_exp(trm[t] * tau[t]));
      } // t
      n[i] = exp(rep_matrix(log_m[i], n_sec) + n_z[i] * sigma[2]);
      delta.1 = 1 - pow(1 - r[1], m[:, i]);
      delta.2 = 1 - pow(1 - r[2], n[i]);
      for (s in 1:2) {
        p[s] = rep_matrix(p_a[s], n_sec, n_prim);
        p[s][first_sec[i], first[i]] = 1;
      } // s
      lmps = rep_matrix(0, 3, n_prim);
      best_lp = rep_matrix(negative_infinity(), 3, n_prim);
      
      for (t in first[i]:n_prim) {
        for (k in 1:n_sec) {
          ltpm_o[t, k][1, 1] = log(p[1][k, t]) + log1m(lambda[1]);
          ltpm_o[t, k][1, 2] = log(p[1][k, t]) + log(lambda[1]);
          ltpm_o[t, k][1, 3] = log1m(p[1][k, t]);
          ltpm_o[t, k][2, 1] = log(p[2][k, t]) + log1m(delta.1[t]);
          ltpm_o[t, k][2, 2] = log(p[2][k, t]) + log(delta.1[t]);
          ltpm_o[t, k][2, 3] = log1m(p[2][k, t]);
          ltpm_y[t, k][1, 1] = log1m(lambda[2]);
          ltpm_y[t, k][1, 2] = log(lambda[2]);
          ltpm_y[t, k][2, 1] = log1m(delta.2[k, t]);
          ltpm_y[t, k][2, 2] = log(delta.2[k, t]);
          // observation log probabilities
          if (q[i, t, k]) {
            for (o in 1:2) {
              omega[t][o, k] = sum(ltpm_y[t, k][o, y[i, t, k]]);
            }
            lmps[1:2, t] += log_product_exp(ltpm_o[t, k][:, 1:2], omega[t][:, k]);
          } else {
            lmps[1:2, t] += ltpm_o[t, k][:, 3];
          }
        } // k
      } // t
      
      // first primary
      lmps[1:2, first[i]] = [ log1m_eta[first[i]], log_eta[first[i]] ]' + lmps[1:2, first[i]];
      best_lp[1:2, first[i]] = lmps[1:2, first[i]];
      
      // Viterbi
      if (first[i] < last[i]) {
        for (t in (first[i] + 1):last[i]) {
          for (s_a in 1:2) {  // state of arrival
            for (s_d in 1:2) {  // state of departure
              tmp = best_lp[s_d, t - 1] + ltpm_z[t - 1][s_a, s_d] + lmps[s_a, t];
              if (tmp > best_lp[s_a, t]) {
                back_ptr[s_a, t] = s_d;
                best_lp[s_a, t] = tmp;
              }
            } // s_d
          } // s_a
          // increment with ecological process for log-lik
          lmps[1:2, t] += log_product_exp(ltpm_z[t - 1][1:2, 1:2], lmps[1:2, t - 1]);
        } // t
      }
      if (last[i] < n_prim) {
        for (t in (last[i] + 1):n_prim) {
          for (s_a in 1:3) {  // state of arrival
            for (s_d in 1:3) {  // state of departure
              tmp = best_lp[s_d, t - 1] + ltpm_z[t - 1][s_a, s_d] + lmps[s_a, t];
              if (tmp > best_lp[s_a, t]) {
                back_ptr[s_a, t] = s_d;
                best_lp[s_a, t] = tmp;
              }
            } // s_d
          } // s_a
          // increment with ecological process for log-lik
          lmps[:, t] += log_product_exp(ltpm_z[t - 1], lmps[:, t - 1]);
        } // t
        log_lik[i] = log_sum_exp(lmps[:, n_prim]);
      }
      
      // ecological states
      tmp = max(best_lp[:, n_prim]);
      for (s_a in 1:3) { // state of arrival
        if (best_lp[s_a, n_prim] == tmp) {
          z[i, n_prim] = s_a;
        }
      } // s_a
      for (t in first[i]:(n_prim - 1)) {
        tt = n_prim - t + first[i];
        z[i, tt - 1] = back_ptr[z[i, tt], tt];
      } // t
    } // i
  }
}

Next I prepare the data for Stan and run HMC with threading enabled for more computational power. On this M2 MacBook Pro I have 10 cores, meaning if I just use 2 HMC chains I can spread the work across 5 cores each. The grainsize arguments recommends how to partition the work in the partial sum function across cores.

Code
# convert diagnostic infection intensities to long format
x_long <- reshape2::melt(x, value.name = "x", varnames = c("ind", "prim", "sec", "diag")) |> 
  as_tibble() |>
  drop_na() |>
  arrange(ind)

# data for Stan
n_chains <- 2
n_cores <- parallel::detectCores()
n_threads <- floor(n_cores / n_chains)
first_sec <- rep(1, n_ind)  # simulated to be first captured in first secondary
last <- apply(y, 1:2, min) |> apply(1, \(y) max(which(y < 3)))
goat_data <- list(grainsize = floor(n_ind / n_threads / 4),
                  n_ind = n_ind, n_prim = n_prim, n_sec = n_sec, n_diag = n_diag,
                  first = first, last = last, first_sec = first_sec, tau = tau, temp = temp,
                  y = ifelse(is.na(y), 3, y),
                  n_x = nrow(x_long), ind = x_long$ind, prim = x_long$prim, sec = x_long$sec,
                  x = x_long$x)

# run HMC
fit_goat <- hmm_goat$sample(data = goat_data, refresh = 0, init = 0.1, 
                            threads_per_chain = n_threads, chains = n_chains, parallel_chains = n_chains, iter_sampling = n_iter)
Running MCMC with 2 parallel chains...

Chain 2 finished in 712.9 seconds.
Chain 1 finished in 719.9 seconds.

Both chains finished successfully.
Mean chain execution time: 716.4 seconds.
Total execution time: 720.1 seconds.

Results

We’ll visually check the parameter estimates with the simulation input and see that it was recovered well. For what it’s worth, in the paper we showed that the infection dynamics get overestimated significantly (5-fold in our application) when not accounting for state misclassification.

Code
# check estimates
fit_goat$summary(c("phi_a", "phi_b", "psi_a", "psi_b", "p_a", "r", "lambda", "mu", "sigma")) |>
  mutate(truth = c(phi_a, phi_b, psi_a, psi_b, p_a, r, lambda, mu, sigma),
         parameter = c(rep("phi", 3), 
                       rep(c("psi", "psi", "p", "r", "lambda"), each = 2), 
                       "mu", rep("sigma", 3)) |>
           factor(levels = c("phi", "psi", "p", "r", "lambda", "mu", "sigma")),
         variable = factor(variable) |>
           str_remove(c("_a")) |> 
           str_replace("phi\\[2\\]", "phi\\[2\\[alpha\\]\\]") |>
           str_replace("phi_b", "phi\\[2\\[beta\\]\\]") |>
           str_replace("psi\\[1\\]", "psi\\[1\\[alpha\\]\\]") |>
           str_replace("psi\\[2\\]", "psi\\[2\\[alpha\\]\\]") |>
           str_replace("psi_b\\[1\\]", "psi\\[1\\[beta\\]\\]") |>
           str_replace("psi_b\\[2\\]", "psi\\[2\\[beta\\]\\]") |>
           fct_reorder(as.numeric(parameter)),
         process = c(rep("Ecological Process", 3 + 2 + 2),
                     rep("Observation and Diagnostic Processes", 2 * 3),
                     rep("Infection Intensities", 1 + 3)) |>
           factor(levels = c("Ecological Process", "Observation and Diagnostic Processes", "Infection Intensities"))) |>
  ggplot(aes(median, fct_rev(variable))) +
  facet_wrap(~ process, scales = "free_y", ncol = 1) +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "#333333")  +
  geom_pointrange(aes(xmin = q5, xmax = q95), position = position_nudge(y = 1/10)) +
  geom_point(aes(truth), colour = green, position = position_nudge(y = -1/10)) +
  scale_x_continuous(breaks = seq(-1, 1, 0.5), expand = c(0, 0)) +
  scale_y_discrete(labels = ggplot2:::parse_safe) +
  labs(x = "Posterior", y = "Parameter")

Again, we might check to see how well we recovered our latent ecological states, which were completely unobservable from the data.

Code
z_est <- fit_goat$summary("z")$median |>
  matrix(n_ind, n_prim) |>
  {\(z) ifelse(z < 1, NA, z)}()
sum(z_est == z, na.rm = T) / sum(!is.na(z))
[1] 0.94

One of the main reasons to account for imperfect pathogen detection is to get more realistic estimates of infection prevalence in the population. We can use the full posterior distribution of the latent ecological states from the Viterbi algorithm to estimate the infection prevalence per primary occasion, which here was simulated to be increasing due to dropping temperatures going into austral winter plotted for some fictional dates to show the unequal primary occasion intervals.

Code
# dates
start <- ymd("2020-12-01")
time_unit <- 21
dates <- seq.Date(start, start + days(1 + round(sum(tau) * time_unit)), by = 1)
prim <- dates[c(1, sapply(1:(n_prim - 1), \(t) round(sum(tau[1:t] * time_unit))))]

# plot
fit_goat$draws("z", format = "draws_df") |>
  pivot_longer(contains("z"), values_to = "z") |>
  mutate(z = if_else(z < 1, NA, z),
         ind = rep(1:n_ind, n() / n_ind) |> factor(),
         prim = rep(prim, each = n_ind) |> rep(n() / n_ind / n_prim)) |>
  mutate(alive = sum(z == 1 | z == 2, na.rm = T),
         infected = sum(z == 2, na.rm = T),
         prevalence = infected / alive,
         .by = c(.draw, prim)) |>
  summarise(mean = mean(prevalence),
            q5 = quantile(prevalence, 0.05),
            q95 = quantile(prevalence, 0.95),
            .by = prim) |>
  ggplot(aes(prim, mean)) +
  geom_pointrange(aes(ymin = q5, ymax = q95), position = position_nudge(x = 2)) +
  geom_point(aes(y = infected / alive),
             data = tibble(prim = prim,
                           alive = apply(z, 2, \(z) sum(z == 1 | z == 2, na.rm = T)),
                           infected = apply(z, 2, \(z) sum(z == 2, na.rm = T))),
             colour = green, position = position_nudge(x = -2)) +
  scale_y_continuous(breaks = seq(0.2, 1, 0.2), limits = c(0, 1.001), expand = c(0, 0)) +
  labs(x = "Primary Occasion", y = "Infection Prevalence")

Thanks for reading if you made it this far. I hope this guide will be useful for ecologists (and others) wanting to transition to gradient-based sampling methods for complex ecological models. I’m sure I’ve some errors somewhere, so if someone finds any or has any questions, please don’t hesitate to reach out.

References

Betancourt, M. 2018, July 15. A conceptual introduction to Hamiltonian Monte Carlo. arXiv.
Carpenter, B., A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell. 2017. Stan: A Probabilistic Programming Language. Journal of Statistical Software 76:1–32.
Cooch, E. G., and G. C. White. 2008. Program MARK: A gentle introduction. 19th edition.
de Valpine, P., D. Turek, C. J. Paciorek, C. Anderson-Bergman, D. T. Lang, and R. Bodik. 2017. Programming with models: Writing statistical algorithms for general model structures with NIMBLE. Journal of Computational and Graphical Statistics 26:403–413.
Ergon, T., Ø. Borgan, C. R. Nater, and Y. Vindenes. 2018. The utility of mortality hazard rates in population analyses. Methods in Ecology and Evolution 9:2046–2056.
Gabry, J., R. Češnovar, and A. Johnson. 2023. CmdStanR: R interface to “CmdStan”. manual.
Glennie, R., T. Adam, V. Leos‐Barajas, T. Michelot, T. Photopoulou, and B. T. McClintock. 2022. Hidden Markov models: Pitfalls and opportunities in ecology. Methods in Ecology and Evolution:2041–210X.13801.
Goulet, V., C. Dutang, M. Maechler, D. Firth, M. Shapira, and M. Stadelmann. 2021. Package “expm”.
Hoffman, M. D., and A. Gelman. 2014. The No-U-Turn Sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research 15:1593–1623.
Hollanders, M., and J. A. Royle. 2022. Know what you don’t know: Embracing state uncertainty in disease-structured multistate models. Methods in Ecology and Evolution 13:2827–2837.
Kéry, M., and J. A. Royle. 2015. Applied Hierarchical Modeling in Ecology: Analysis of distribution, abundance and species richness in R and BUGS: Volume 1: Prelude and Static Models. Academic Press.
Kéry, M., and J. A. Royle. 2020. Applied hierarchical modeling in ecology: Analysis of distribution, abundance and species richness in r and BUGS: Volume 2: Dynamic and advanced models. Academic Press.
Neal, R. M. 1994. An improved acceptance procedure for the hybrid Monte Carlo algorithm. Journal of Computational Physics 111:194–203.
Pollock, K. H. 1982. A capture-recapture design robust to unequal probability of capture. The Journal of Wildlife Management 46:752–757.
R Core Team. 2023. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Royle, J. A., and W. A. Link. 2006. Generalized site occupancy models allowing for false positive and false negative errors. Ecology 87:835–841.
Scheele, B. C., F. Pasmans, L. Skerratt, L. Berger, A. Martel, W. Beukema, A. A. Acevedo, P. A. Burrowes, T. Carvalho, A. Catenazzi, C. N. Foster, P. Frías-Álvarez, T. W. J. Garner, B. Gratwicke, J. M. Guayasamin, M. Hirschfeld, J. E. Kolby, T. A. Kosch, A. V. Longo, R. Maneyro, C. A. McDonald, J. R. Mendelson III, P. Palacios-Rodriguez, G. Parra-Olea, C. L. Richards-Zawacki, M.-O. Rödel, S. M. Rovito, C. Soto-Azat, L. F. Toledo, J. Voyles, C. Weldon, S. M. Whitfield, M. Wilkinson, K. R. Zamudio, and S. Canessa. 2019. Amphibian fungal panzootic causes catastrophic and ongoing loss of biodiversity. Science 363:1459–1463.
Vehtari, A., J. Gabry, M. Magnusson, Y. Yao, P.-C. Bürkner, T. Paananen, and A. Gelman. 2023. loo: Efficient leave-one-out cross-validation and WAIC for Bayesian models.
Wickham, H. 2016. ggplot2: Elegant graphics for data analysis. Springer-Verlag New York.
Wickham, H., M. Averick, J. Bryan, W. Chang, L. D. McGowan, R. François, G. Grolemund, A. Hayes, L. Henry, J. Hester, M. Kuhn, T. L. Pedersen, E. Miller, S. M. Bache, K. Müller, J. Ooms, D. Robinson, D. P. Seidel, V. Spinu, K. Takahashi, D. Vaughan, C. Wilke, K. Woo, and H. Yutani. 2019. Welcome to the tidyverse. Journal of Open Source Software 4:1686.
Youngflesh, C. 2018. MCMCvis: tools to visualize, manipulate, and summarize MCMC output. Journal of Open Source Software 3:640.