Risk Metrics • riskworkflowr

Overview

riskworkflowr includes simple helper functions for calculating commonly used spatial risk metrics.

The current metric functions include:

risk_calc_rate()
risk_calc_poisson_probability()
risk_calc_smr()
risk_calc_location_quotient()

These functions are intended to support transparent and reproducible analytical workflows rather than replace specialist epidemiological or spatial statistical methods.

Example data

data <- data.frame(
  event_count = c(5, 10, 20),
  population = c(1000, 2000, 3000)
)

Rates

risk_calc_rate() calculates a simple rate:

observed events / denominator × multiplier

risk_calc_rate(
  data = data,
  count_col = "event_count",
  denominator_col = "population"
)

##   event_count population rate_per_10000
## 1           5       1000       50.00000
## 2          10       2000       50.00000
## 3          20       3000       66.66667

Poisson probability

risk_calc_poisson_probability() estimates the probability of one or more events occurring over a defined period using an observed event frequency.

poisson_data <- data.frame(
  event_count = c(1, 5, 10),
  years = c(1, 2, 5)
)

risk_calc_poisson_probability(
  data = poisson_data,
  count_col = "event_count",
  period_col = "years",
  output = "both"
)

##   event_count years lambda prob_event_ge_1 prob_event_ge_1_pct
## 1           1     1    1.0       0.6321206            63.21206
## 2           5     2    2.5       0.9179150            91.79150
## 3          10     5    2.0       0.8646647            86.46647

Standardised mortality ratio style workflows

risk_calc_smr() calculates expected counts and standardised mortality ratio style outputs.

Although named using SMR terminology, the same general structure may be useful for other observed-versus-expected spatial risk workflows where an appropriate denominator is available.

risk_calc_smr(
  data = data,
  observed_col = "event_count",
  denominator_col = "population"
)

##   event_count population expected_count       smr smr_lower smr_upper
## 1           5       1000       5.833333 0.8571429 0.2783120  2.000285
## 2          10       2000      11.666667 0.8571429 0.4110333  1.576316
## 3          20       3000      17.500000 1.1428571 0.6980868  1.765050
##             smr_ci_flag
## 1 not_clearly_different
## 2 not_clearly_different
## 3 not_clearly_different

Distinctive category workflows

risk_distinct_category() supports grouped/category comparative risk profiling.

It is intended for datasets where events are grouped by both spatial unit and category, such as injury mechanism, incident type, hazard class, or activity type.

category_data <- data.frame(
  unit_id = c("A", "A", "B", "B"),
  category = c("Falls", "Water", "Falls", "Water"),
  event_count = c(10, 2, 3, 8),
  population = c(1000, 1000, 800, 800)
)

risk_distinct_category(
  data = category_data,
  unit_id_col = "unit_id",
  category_col = "category",
  observed_col = "event_count",
  denominator_col = "population",
  min_count = 1
)

##   unit_id highest_category highest_smr highest_event_count lowest_category
## 1       A            Falls    1.384615                  10           Water
## 2       B            Water    1.800000                   8           Falls
##   lowest_smr lowest_event_count category_count_used insufficient_count_flag
## 1  0.3600000                  2                   2                   FALSE
## 2  0.5192308                  3                   2                   FALSE

The output identifies the highest comparative category for each unit and, optionally, the lowest comparative category.

This workflow is intended for exploratory spatial risk profiling and should be interpreted carefully where counts are low, denominators are unstable, or categories are inconsistently coded.

Assumptions

Risk metric workflows assume:

event counts are suitable for the intended analysis
denominators are appropriate and interpretable
spatial units are meaningful for the question being asked
low counts and small denominators are interpreted carefully
outputs are reviewed alongside context and data quality

Limitations and pitfalls

Potential limitations include:

unstable rates in small areas
misleading interpretation of rare events
denominator uncertainty
inconsistent event coding
ecological interpretation risks
overinterpretation of exploratory outputs

Alternative approaches

Depending on the analytical objective and available data, alternative methods may include:

direct standardisation
indirect standardisation
age-sex stratified SMR
Bayesian smoothing
empirical Bayes methods
Poisson or negative binomial regression
spatial regression
cluster detection methods
hierarchical models

These methods may be preferable where sufficient covariate information, stable denominators, or specific inferential objectives exist.

Future development

Future development priorities include:

grouped/category SMR workflows
clearer distinction between SMR and location quotient style calculations
category dominance outputs for mapping
optional low-count thresholds
clearer interpretation flags
enhanced examples and visual outputs