Overview
vrcmort implements a hierarchical Bayesian model for
vital registration (VR) mortality counts in settings where the
reporting mechanism changes over time.
The core idea is to distinguish:
- the latent mortality process (the deaths that truly occur), and
- the reporting process (the probability that those deaths are recorded in VR).
This separation is essential when conflict simultaneously increases mortality and reduces the completeness of registration.
Throughout this vignette we describe the model in a generative way. In code, the model is fit in Stan.
Notation and indices
We index:
- region by
- time (usually month) by
- age group by
- sex by
- cause group by
For many conflict applications you will start with :
- : trauma / violence-related
- : non-trauma
The model is written at the cell level and the R interface expects the data to be aggregated into this long format.
Observed data
For each cell we observe:
- : the number of deaths recorded in VR
- : the exposure (person-time at risk)
We also observe region-time covariates. Two are central:
- : a conflict intensity proxy
- : a health system functioning proxy (for example, facility availability)
Additional covariates can be added in the mortality submodel and/or the reporting submodel.
Exposure
Exposure is whatever scale makes your rates interpretable and comparable across cells.
- If you work with month-level counts and every month is fully covered, you can use (population).
- If periods have different lengths, or if VR was operational for only part of a period, you can use person-time, for example .
In vrcmort the expected count is proportional to
exposure.
Latent mortality process
Let be the true death rate (per unit exposure) in a cell.
A simple generative story is:
where is the number of true deaths.
In practice, VR counts often show more variation than a Poisson
model. vrcmort therefore uses a negative binomial
observation model (see below), which can be understood as a Poisson
model with extra over-dispersion.
Log-linear model for the rate
We model the log rate as a structured additive predictor:
Interpretation of terms:
- : baseline log rate for cause
- : age pattern for cause
- : sex effect for cause
- : region random intercept (partial pooling)
- : national time random walk (smooth temporal variation)
- : effect of conflict intensity on the true rate
- : additional mortality covariate effects
The conflict effect in the default model is constrained to be non-negative:
This is a deliberate guardrail against the common artefact where a regression on observed VR counts concludes that conflict reduces mortality.
Reporting process (registration completeness)
Let be the probability that a true death in a cell is recorded in VR.
A natural generative step is binomial thinning:
where would be the number of observed deaths if causes were perfectly classified. In the base model we do not explicitly model misclassification; instead we recommend starting with robust cause groupings (for example trauma vs non-trauma).
Logistic model for completeness
We model the logit of completeness:
Key pieces:
- : baseline completeness (pre-conflict) for cause
- : a shift in completeness after conflict begins
- : region random intercept for completeness
- : national time random walk for completeness
- : effect of conflict on completeness
- : additional reporting covariate effects
The indicator is 0 before the conflict start time and 1 afterwards.
Age-selective reporting collapse
A distinctive feature of conflict VR data is that completeness can become age-selective: older deaths (especially non-trauma) drop out of the system.
vrcmort represents this with an age penalty applied
after conflict starts, typically only for non-trauma causes:
where is a monotone non-decreasing function of age group. In the Stan model this is implemented by modelling positive increments between age groups.
The result is that, after conflict begins, the model allows the observed VR age distribution to shift younger even when the underlying mortality age schedule remains broadly similar.
Observation model for VR counts
The model used for fitting is a negative binomial likelihood on the observed counts:
with
The parameter is a cause-specific dispersion parameter.
This single likelihood implicitly marginalises over the latent true deaths . You can still interpret the model generatively as “true deaths occur” then “some are recorded”.
Identifiability and anchoring
From the likelihood alone, and are only weakly identified because they appear as a product .
vrcmort uses several sources of information to separate
them:
- Pre-conflict behaviour: if VR looks stable pre-conflict, it is reasonable to place informative priors on baseline completeness .
- Age structure: age-selective collapse is informative about reporting rather than mortality.
- Smoothness: random walk priors on time effects discourage implausibly sharp swings in the latent mortality rate.
- Covariate separation: covariates believed to reflect system functioning are more naturally placed in the reporting model than the mortality model.
The priors are therefore not an afterthought; they are part of what makes inference possible.
Extensions supported in the current base model
The base Stan program shipped with vrcmort includes
optional structured extensions controlled by the R interface:
- Region-varying conflict effects: partial pooling for the conflict coefficient by region.
- Region-specific time trends: region-specific random walks around a national trend.
Other extensions (misclassification between cause groups, population uncertainty, additional observation streams) can be added by extending the Stan components. The vignette on implementation explains how the Stan code is modularised.