Session 4 Example

Preparation

In this tutorial we will be using the cmdstanr R interface to CmdStan and the bayesplot package for our graphical model checking:

library(cmdstanr)
library(posterior)
library(bayesplot)
library(ggplot2)
library(dplyr)

Epilepsy RCT Model Expansion

For this example, we are analysing the results of a randomised controlled trial for a new epilepsy treatment. To briefly summarise our data:

  • 59 participants randomised to either treatment or control
  • 5 assessments (baseline + 4 follow-ups)
  • Outcome is the number of seizures experienced

Data

Let’s load our dataset and look at the general structure:

data(epilepsy, package="brms")
head(epilepsy)
  Age Base Trt patient visit count obs       zAge      zBase
1  31   11   0       1     1     5   1  0.4249950 -0.7571728
2  30   11   0       2     1     3   2  0.2652835 -0.7571728
3  25    6   0       3     1     2   3 -0.5332740 -0.9444033
4  36    8   0       4     1     4   4  1.2235525 -0.8695111
5  22   66   0       5     1     7   5 -1.0124085  1.3023626
6  29   27   0       6     1     5   6  0.1055720 -0.1580352

Note that the continuous covariates have been standardised to simplify prior specification - this might not always be appropriate!

Initial Model

What do we believe will be related to the seizure counts in our trial?

  • Treatment or Control group
  • Time since beginning treatment
    • Different between treatment & control
  • Baseline number of seizures
  • Age of participant

In R formula notation:

count ~ Trt + visit + Trt*visit + zBase + zAge
epilepsy_prep <- epilepsy |>
  mutate(Trt = as.numeric(Trt) - 1,
         visit = as.numeric(visit) - 1,
         treat_x_visit = Trt * visit)
epilepsy_standata <- list(
  N = length(unique(epilepsy_prep$patient)),
  T = length(unique(epilepsy_prep$visit)),
  K = 5,
  ID = epilepsy_prep$patient,
  x = epilepsy_prep[,c("Trt", "visit", "treat_x_visit", "zAge", "zBase")],
  seizures = epilepsy_prep$count
)

Outcome Family

Given that we are modelling counts, a Poisson is a good start. When using a Poisson outcome in a General Linear Model (GLM), we commonly use a log link function:

\[ \displaylines{ y_i \sim Poisson(\lambda_i) \\ \lambda_i = \exp(\alpha + x_i^T\beta) } \]

Priors

We will use weakly-informative priors (regularise our estimates to feasible/possible values). Because the intercept and coefficients are on the log-scale, the interpretation and choice of priors is slightly different

Stan Model

We’ve decided to use a Poisson Generalised Linear Model with a log-link as our initial attempt for modelling the data:

\[ \displaylines{ y_i \sim Poisson(\lambda_i) \\ \lambda_i = \exp(\alpha + x_i^T\beta) \\ \alpha \sim N(0,5) \\ \beta_{1:5} \sim N(0,1) } \]

Let’s have a look at how we would specify this in Stan:

data {
  int N;
  int T;
  int K;
  array[N*T] int ID;
  matrix[N*T, K] x;
  array[N*T] int seizures;
}

parameters {
  real alpha;
  vector[K] beta;
}

transformed parameters {
  vector[N*T] lambda = alpha + x * beta;
}

model {
  alpha ~ normal(0, 5);
  beta ~ std_normal();
  seizures ~ poisson_log(lambda);
}

generated quantities {
  array[N*T] int ypred = poisson_log_rng(lambda);
  vector[N*T] log_lik;
  for (nt in 1:(N*T)) {
    log_lik[nt] = poisson_log_lpmf(seizures[nt] | lambda[nt]);
  }
}
poisson_fit <- poisson_mod$sample(
  data = epilepsy_standata,
  parallel_chains = 4,
  refresh = 0,
  show_messages = FALSE,
  show_exceptions = FALSE,
  seed = 2024
)

To inspect our estimates, we’ll first rename and transform them back to the interpretable scale (exponentiating):

poisson_fit$draws() |>
    mutate_variables(intercept = exp(alpha),
                     b_treat = exp(`beta[1]`), 
                     b_visit = exp(`beta[2]`), 
                     b_treat_x_visit = exp(`beta[3]`), 
                     b_zAge = exp(`beta[4]`), 
                     b_zBase = exp(`beta[5]`)) |>
    subset_draws(variable=c("intercept", "b_treat", "b_visit", "b_treat_x_visit", "b_zAge", "b_zBase")) |>
    summarise_draws()
# A tibble: 6 × 10
  variable         mean median     sd    mad    q5   q95  rhat ess_bulk ess_tail
  <chr>           <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 intercept       7.23   7.22  0.394  0.398  6.61   7.89  1.00    1653.    2045.
2 b_treat         0.900  0.898 0.0659 0.0659 0.796  1.01  1.00    1600.    2053.
3 b_visit         0.957  0.956 0.0275 0.0272 0.913  1.00  1.00    1794.    2191.
4 b_treat_x_visit 0.972  0.970 0.0388 0.0383 0.911  1.04  1.00    1634.    2145.
5 b_zAge          1.15   1.15  0.0286 0.0278 1.11   1.20  1.00    3037.    2596.
6 b_zBase         1.83   1.83  0.0247 0.0245 1.79   1.87  1.00    2671.    2349.

We can see that all R-hats and ESS are within acceptable ranges. Next, we check the posterior-predictives for any obvious issues:

ppc_pit_ecdf(y = epilepsy_standata$seizures,
                 poisson_fit$draws("ypred", format = "draws_matrix"))

We can see that the model is clearly inappropriate for the current dataset.

Random-Effects Model (Normal)

We’ll add a random intercept for each individual, to relax the assumption of equal mean and variance in the Poisson. This also helps account for the non-independence of the observations - there are repeated measurements for the same individual:

\[ \displaylines{ y_i \sim Poisson(\lambda_i) \\ \lambda_i = \exp(\alpha + x_i^T\beta + u_i) \\ \alpha \sim N(0,5) \\ \beta_{1:5} \sim N(0,1) \\ u_i \sim N(0,\sigma) \\ \sigma \sim Cauchy^+(0,5) } \]

data {
  int N;
  int T;
  int K;
  array[N*T] int ID;
  matrix[N*T, K] x;
  array[N*T] int seizures;
}

parameters {
  real alpha;
  vector[K] beta;
  real<lower=0> u_sd;
  vector[N] u;
}

transformed parameters {
  vector[N*T] lambda = u[ID] + alpha + x * beta;
}

model {
  alpha ~ normal(0, 5);
  beta ~ std_normal();
  u_sd ~ cauchy(0, 5);
  u ~ normal(0, u_sd);
  seizures ~ poisson_log(lambda);
}

generated quantities {
  array[N*T] int ypred = poisson_log_rng(lambda);
  vector[N*T] log_lik;
  for (nt in 1:(N*T)) {
    log_lik[nt] = poisson_log_lpmf(seizures[nt] | lambda[nt]);
  }
}
poisson_ranef_fit <- poisson_ranef_mod$sample(
  data = epilepsy_standata,
  parallel_chains = 4,
  refresh = 0,
  show_messages = FALSE,
  show_exceptions = FALSE,
  seed = 2024
)

We can see again that the Rhats & ESS are still acceptable (slightly worse ESS, but we’ll get to that!), but the posterior-predictive check is much better.

poisson_ranef_fit$draws() |>
    mutate_variables(intercept = exp(alpha),
                     b_treat = exp(`beta[1]`), 
                     b_visit = exp(`beta[2]`), 
                     b_treat_x_visit = exp(`beta[3]`), 
                     b_zAge = exp(`beta[4]`), 
                     b_zBase = exp(`beta[5]`)) |>
    subset_draws(variable=c("intercept", "b_treat", "b_visit", "b_treat_x_visit", "b_zAge", "b_zBase")) |>
    summarise_draws()
# A tibble: 6 × 10
  variable         mean median     sd    mad    q5   q95  rhat ess_bulk ess_tail
  <chr>           <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 intercept       6.30   6.27  0.776  0.781  5.10   7.62  1.00     812.    1312.
2 b_treat         0.818  0.808 0.140  0.137  0.609  1.06  1.00     767.    1425.
3 b_visit         0.958  0.957 0.0275 0.0270 0.913  1.00  1.00    2191.    2636.
4 b_treat_x_visit 0.970  0.970 0.0393 0.0394 0.907  1.03  1.00    2210.    2657.
5 b_zAge          1.09   1.09  0.0886 0.0891 0.950  1.24  1.00     743.    1458.
6 b_zBase         2.07   2.06  0.165  0.159  1.82   2.35  1.00     810.    1445.
ppc_pit_ecdf(y = epilepsy_standata$seizures,
                 poisson_ranef_fit$draws("ypred", format = "draws_matrix"))


The model is now better able to represent the observed data, but we are now estimating an additional \(N\) parameters and the lower ESS indicates poorer exploration.

What can we do about this?

Random-Effects Model (Gamma)

Let’s change our normally-distributed random effect to a Gamma with equal shape and rate parameters:

\[ \displaylines{ y_i \sim Poisson(\lambda_i) \\ \lambda_i = \exp(\alpha + x_i^T\beta + u_i) \\ \alpha \sim N(0,5) \\ \beta_{1:5} \sim N(0,1) \\ \theta_i \sim Gamma(\phi,\phi) \\ \phi \sim Cauchy^+(0,5) } \]

data {
  int N;
  int T;
  int K;
  array[N*T] int ID;
  matrix[N*T, K] x;
  array[N*T] int seizures;
}

parameters {
  real alpha;
  vector[K] beta;
  real<lower=0> u_shape;
  vector<lower=0>[N] u;
}

transformed parameters {
  vector[N*T] lambda = log(u[ID]) + alpha + x * beta;
}

model {
  alpha ~ normal(0, 5);
  beta ~ std_normal();
  u_shape ~ cauchy(0, 5);
  u ~ gamma(u_shape, u_shape);
  seizures ~ poisson_log(lambda);
}

generated quantities {
  array[N*T] int ypred = poisson_log_rng(lambda);
  vector[N*T] log_lik;
  for (nt in 1:(N*T)) {
    log_lik[nt] = poisson_log_lpmf(seizures[nt] | lambda[nt]);
  }
}
poisson_gamma_fit <- poisson_gamma_mod$sample(
  data = epilepsy_standata,
  parallel_chains = 4,
  refresh = 0,
  show_messages = FALSE,
  show_exceptions = FALSE,
  seed = 2024
)
poisson_gamma_fit$draws() |>
    mutate_variables(intercept = exp(alpha),
                     b_treat = exp(`beta[1]`), 
                     b_visit = exp(`beta[2]`), 
                     b_treat_x_visit = exp(`beta[3]`), 
                     b_zAge = exp(`beta[4]`), 
                     b_zBase = exp(`beta[5]`)) |>
    subset_draws(variable=c("intercept", "b_treat", "b_visit", "b_treat_x_visit", "b_zAge", "b_zBase")) |>
    summarise_draws()
# A tibble: 6 × 10
  variable         mean median     sd    mad    q5   q95  rhat ess_bulk ess_tail
  <chr>           <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 intercept       7.12   7.08  0.850  0.836  5.83   8.58  1.00     718.    1403.
2 b_treat         0.876  0.864 0.147  0.136  0.659  1.14  1.00     727.    1271.
3 b_visit         0.958  0.957 0.0277 0.0282 0.913  1.00  1.00    2440.    2900.
4 b_treat_x_visit 0.971  0.970 0.0391 0.0392 0.907  1.04  1.00    2285.    2832.
5 b_zAge          1.11   1.11  0.0929 0.0895 0.969  1.27  1.01     854.     992.
6 b_zBase         2.11   2.11  0.192  0.194  1.82   2.44  1.00     883.    1556.
ppc_pit_ecdf(y = epilepsy_standata$seizures,
                 poisson_gamma_fit$draws("ypred", format = "draws_matrix"))

Still looking good! But what’s the point?

Random-Effects (Negative-Binomial)

Remember that we can represent the Poisson with a Gamma-distributed random effect as a Negative-Binomial parameterised by its mean and dispersion:

\[ \int Poisson(y | \lambda\theta) \cdot Gamma(\theta | \phi, \phi) d\theta = NB(y|\lambda, \phi) \]

But don’t just take my word for it, let’s verify this in R by comparing the numerically integrated Poisson-Gamma with the Negative-Binomial:

lambda <- 2.65
y <- 4
phi <- 1.5

poisson_gamma_pdf <- function(theta, y, lambda, phi) {
  exp(dpois(y, lambda * theta, log = TRUE) + dgamma(theta, shape = phi, rate = phi, log = TRUE))
}

integrate(poisson_gamma_pdf, 0, Inf, y, lambda, phi)
0.08891119 with absolute error < 9.4e-07
dnbinom(y, mu = lambda, size = phi)
[1] 0.08891119

This means that we can express our random-effects model without needing to estimate the additional \(N\) parameters! Let’s see that in Stan:

data {
  int N;
  int T;
  int K;
  array[N*T] int ID;
  matrix[N*T, K] x;
  array[N*T] int seizures;
}

parameters {
  real alpha;
  vector[K] beta;
  real<lower=0> u_shape;
}

transformed parameters {
  vector[N*T] lambda = alpha + x * beta;
}

model {
  alpha ~ normal(0, 5);
  beta ~ std_normal();
  u_shape ~ cauchy(0, 5);
  seizures ~ neg_binomial_2_log(lambda, u_shape);
}

generated quantities {
  array[N*T] int ypred = neg_binomial_2_log_rng(lambda, u_shape);
  vector[N*T] log_lik;
  for (nt in 1:(N*T)) {
    log_lik[nt] = neg_binomial_2_log_lpmf(seizures[nt] | lambda[nt], u_shape);
  }
}
nb_fit <- nb_mod$sample(
  data = epilepsy_standata,
  parallel_chains = 4,
  refresh = 0,
  show_messages = FALSE,
  show_exceptions = FALSE,
  seed = 2024
)
nb_fit$draws() |>
    mutate_variables(intercept = exp(alpha),
                     b_treat = exp(`beta[1]`), 
                     b_visit = exp(`beta[2]`), 
                     b_treat_x_visit = exp(`beta[3]`), 
                     b_zAge = exp(`beta[4]`), 
                     b_zBase = exp(`beta[5]`)) |>
    subset_draws(variable=c("intercept", "b_treat", "b_visit", "b_treat_x_visit", "b_zAge", "b_zBase")) |>
    summarise_draws()
# A tibble: 6 × 10
  variable         mean median     sd    mad    q5   q95  rhat ess_bulk ess_tail
  <chr>           <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 intercept       7.12   7.07  0.860  0.841  5.81   8.60  1.00    1838.    2267.
2 b_treat         0.852  0.842 0.142  0.136  0.639  1.11  1.00    1810.    2529.
3 b_visit         0.962  0.958 0.0620 0.0602 0.863  1.07  1.00    1875.    2485.
4 b_treat_x_visit 0.995  0.991 0.0902 0.0894 0.856  1.15  1.00    1794.    2597.
5 b_zAge          1.12   1.12  0.0578 0.0587 1.03   1.22  1.00    4286.    2915.
6 b_zBase         2.05   2.05  0.109  0.108  1.89   2.24  1.00    3723.    2878.
ppc_pit_ecdf(y = epilepsy_standata$seizures,
                 nb_fit$draws("ypred", format = "draws_matrix"))

We can see that the estimates and PPC results are consistent with the previous random-effects (Gamma) model, but the ESS are much improved.

Could we do better if we were concerned about efficiency?

Random-Effects (Negative-Binomial GLM)

data {
  int N;
  int T;
  int K;
  array[N*T] int ID;
  matrix[N*T, K] x;
  array[N*T] int seizures;
}

parameters {
  real alpha;
  vector[K] beta;
  real<lower=0> u_shape;
}

model {
  alpha ~ normal(0, 5);
  beta ~ std_normal();
  u_shape ~ cauchy(0, 5);
  seizures ~ neg_binomial_2_log_glm(x, alpha, beta, u_shape);
}

generated quantities {
  vector[N*T] lambda = alpha + x * beta;
  array[N*T] int ypred = neg_binomial_2_log_rng(lambda, u_shape);
  vector[N*T] log_lik;
  for (nt in 1:(N*T)) {
    log_lik[nt] = neg_binomial_2_log_lpmf(seizures[nt] | lambda[nt], u_shape);
  }
}
nb_glm_fit <- nb_glm_mod$sample(
  data = epilepsy_standata,
  parallel_chains = 4,
  refresh = 0,
  show_messages = FALSE,
  show_exceptions = FALSE,
  seed = 2024
)
nb_glm_fit$draws() |>
    mutate_variables(intercept = exp(alpha),
                     b_treat = exp(`beta[1]`), 
                     b_visit = exp(`beta[2]`), 
                     b_treat_x_visit = exp(`beta[3]`), 
                     b_zAge = exp(`beta[4]`), 
                     b_zBase = exp(`beta[5]`)) |>
    subset_draws(variable=c("intercept", "b_treat", "b_visit", "b_treat_x_visit", "b_zAge", "b_zBase")) |>
    summarise_draws()
# A tibble: 6 × 10
  variable         mean median     sd    mad    q5   q95  rhat ess_bulk ess_tail
  <chr>           <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 intercept       7.12   7.06  0.907  0.893  5.78   8.69  1.00    1523.    1865.
2 b_treat         0.853  0.838 0.151  0.147  0.627  1.12  1.00    1521.    2288.
3 b_visit         0.962  0.961 0.0641 0.0640 0.861  1.07  1.00    1576.    2134.
4 b_treat_x_visit 0.995  0.991 0.0936 0.0934 0.849  1.16  1.00    1630.    2401.
5 b_zAge          1.12   1.12  0.0589 0.0575 1.02   1.22  1.00    4173.    2959.
6 b_zBase         2.06   2.05  0.112  0.108  1.89   2.25  1.00    4037.    2695.
ppc_pit_ecdf(y = epilepsy_standata$seizures,
                 nb_glm_fit$draws("ypred", format = "draws_matrix"))

We can see that the results are consistent, but now the GLM model is the fastest:

poisson_ranef_fit$time()$total
[1] 1.12257
poisson_gamma_fit$time()$total
[1] 1.441908
nb_fit$time()$total
[1] 1.028801
nb_glm_fit$time()$total
[1] 0.724715

Selecting Models

Let’s investigate whether we want to keep our Random-Effects (Normal) or Random-Effects (Negative-Binomial) model.

Explanation

Do our inferences for the treatment effect differ between the two models?

poisson_ranef_fit$draws() |>
    mutate_variables(b_treat = exp(`beta[1]`),
                     b_treat_x_visit = exp(`beta[3]`)) |>
    subset_draws(variable=c("b_treat", "b_treat_x_visit")) |>
    summarise_draws()
# A tibble: 2 × 10
  variable         mean median     sd    mad    q5   q95  rhat ess_bulk ess_tail
  <chr>           <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 b_treat         0.818  0.808 0.140  0.137  0.609  1.06  1.00     767.    1425.
2 b_treat_x_visit 0.970  0.970 0.0393 0.0394 0.907  1.03  1.00    2210.    2657.
nb_fit$draws() |>
    mutate_variables(b_treat = exp(`beta[1]`),
                     b_treat_x_visit = exp(`beta[3]`)) |>
    subset_draws(variable=c("b_treat", "b_treat_x_visit")) |>
    summarise_draws()
# A tibble: 2 × 10
  variable         mean median     sd    mad    q5   q95  rhat ess_bulk ess_tail
  <chr>           <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 b_treat         0.852  0.842 0.142  0.136  0.639  1.11  1.00    1810.    2529.
2 b_treat_x_visit 0.995  0.991 0.0902 0.0894 0.856  1.15  1.00    1794.    2597.

We can see that estimated effects are slightly larger for the negative-binomial model, although both have similar amounts of uncertainty (width of credibility intervals). Additionally, the much-reduced ESS of the random-effects (normal) model reduces our confidence in the estimates.

If our goal was Explanation/Inference, we would likely select the Negative-Binomial.

Prediction

Can one of the models better predict new data than the other? We will use appromixate LOO-CV to investigate this.

poisson_ranef_loo <- poisson_ranef_fit$loo()
Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
poisson_ranef_loo

Computed from 4000 by 236 log-likelihood matrix.

         Estimate   SE
elpd_loo   -671.2 37.7
p_loo        97.5 15.3
looic      1342.4 75.4
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.8]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     226   95.8%   108     
   (0.7, 1]   (bad)        7    3.0%   <NA>    
   (1, Inf)   (very bad)   3    1.3%   <NA>    
See help('pareto-k-diagnostic') for details.
plot(poisson_ranef_loo)

Looks like the PSIS algorithm is having difficulty approximating the leave-one-out performance any guesses why?

Let’s integrate the normal random effect out of our predictive density:

functions {
  real poisson_integrand(real u, real notused, array[] real theta,
               array[] real X_i, array[] int y_i) {
    real u_sd = theta[1];
    real lambda_minus_u = theta[2];
    real p = exp(normal_lpdf(u | 0, u_sd) + poisson_log_lpmf(y_i | u + lambda_minus_u));
    return (is_inf(p) || is_nan(p)) ? 0 : p;
  }
  real poisson_log_integrated_lpmf(int y, real lambda_minus_u, real u_sd) {
    real pmf =
      integrate_1d(poisson_integrand, negative_infinity(), positive_infinity(),
                   {u_sd, lambda_minus_u}, {0}, {y});
    
    return log(pmf);
  }
}

data {
  int N;
  int T;
  int K;
  array[N*T] int ID;
  matrix[N*T, K] x;
  array[N*T] int seizures;
}

parameters {
  real alpha;
  vector[K] beta;
  real<lower=0> u_sd;
  vector[N] u;
}

transformed parameters {
  vector[N*T] lambda = u[ID] + alpha + x * beta;
}

model {
  alpha ~ normal(0, 5);
  beta ~ std_normal();
  u_sd ~ cauchy(0, 5);
  u ~ normal(0, u_sd);
  seizures ~ poisson_log(lambda);
}

generated quantities {
  array[N*T] int ypred = poisson_log_rng(lambda);
  vector[N*T] log_lik;
  for (nt in 1:(N*T)) {
    log_lik[nt] = poisson_log_integrated_lpmf(seizures[nt] | (lambda - u[ID])[nt], u_sd);
  }
}
poisson_ranef_int_fit <- poisson_ranef_int_mod$sample(
  data = epilepsy_standata,
  parallel_chains = 4,
  refresh = 0,
  show_messages = FALSE,
  show_exceptions = FALSE,
  seed = 2024
)

Let’s try our LOO results again:

poisson_ranef_loo <- poisson_ranef_int_fit$loo()
Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
poisson_ranef_loo

Computed from 4000 by 236 log-likelihood matrix.

         Estimate   SE
elpd_loo   -669.6 20.0
p_loo        17.0  2.6
looic      1339.2 40.0
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.2, 0.5]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     235   99.6%   156     
   (0.7, 1]   (bad)        1    0.4%   <NA>    
   (1, Inf)   (very bad)   0    0.0%   <NA>    
See help('pareto-k-diagnostic') for details.
plot(poisson_ranef_loo)

Even with integration, we still have a problematic data point. The next step is to try to use LOO with moment-matching to resolve this:

poisson_ranef_loo <- poisson_ranef_int_fit$loo(moment_match = TRUE)
Compiling additional model methods...
poisson_ranef_loo

Computed from 4000 by 236 log-likelihood matrix.

         Estimate   SE
elpd_loo   -669.6 20.0
p_loo        16.5  2.2
looic      1339.2 40.0
------
MCSE of elpd_loo is 0.2.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.2, 0.5]).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
plot(poisson_ranef_loo)

Much better! Now we can compare against our Negative-Binomial:

nb_loo <- nb_fit$loo()
nb_loo

Computed from 4000 by 236 log-likelihood matrix.

         Estimate   SE
elpd_loo   -664.0 19.1
p_loo         7.7  1.6
looic      1328.1 38.2
------
MCSE of elpd_loo is 0.1.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.3]).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
plot(nb_loo)

loo::loo_compare(list("Poisson" = poisson_ranef_loo, "Negative-Binomial" = nb_loo))
                  elpd_diff se_diff
Negative-Binomial  0.0       0.0   
Poisson           -5.6       3.4   

It looks like there is no meaningful difference in predictive performance between the two models. What should we do now?