Methods

Data wrangle

This section documents how the meteorological datasets were prepared, aligned, and compared so the workflow is transparent and repeatable.

Data harmonisation units and time step

Data Sources

Daily meteorological datasets were compiled for Lake Rotorua from ERA5‑Land reanalysis, the Limnotrack buoy (in‑situ) on Lake Rotorua, NIWA Airport_1770 observations, NIWA Town_40177 observations, and the NIWA Virtual Climate Station Network (VCS_On). ERA5 variables were extracted using aemetools::get_era5_land_point_nz, VCS and reference (Airport and town) data were downloaded manually (and separably at times - joined later in R) from NIWA “Earth Science New Zealand” datahub (click to access) and saved as CSV in “data/raw” folder. All datasets were standardised to common variable names (Temp_C, Precip_mm, Wind_Spd_ms, RadSWD_Wm2) with a shared Date field.

Sub‑daily records (5&15 minute buoy and hourly station data) were parsed in UTC and aggregated to daily values, using totals for precipitation and means for temperature, wind speed, and radiation. Radiation in MJ m⁻² was converted to W m⁻² using Rad_MJm2 * 1e6 / 86400 for daily totals and Rad_MJm2 * 1e6 / 3600 for hourly values prior to daily means. Buoy wind speeds were corrected for a documented unit change (knots to m/s before a switch date), and buoy precipitation was converted from mm hr⁻¹ to 5‑minute depth before aggregation. Historical buoy records (15‑minute) were aggregated to daily means, with known incorrect wind periods set to NA to keep a continuous date. Town_40177 and Airport_1770 were assembled from hourly radiation/temperature (daily means) plus daily wind and rain. VCS_On daily data were converted to numeric, with temperature computed as the mean of Tmin and Tmax. All daily datasets were written as aligned CSVs, which were then used to compare against eachother for dates that overlapped.

Reference site justification

Airport station data were used as the primary reference because they are maintained, quality-controlled, and located near the lake with low data gaps over a longer period. All alternative sources (ERA5, VCSN, buoy, Town) were evaluated against this reference over their overlapping dates. The reference source (airport) is referred to as “obs”, ‘reference’ throughout, while the other sources we compare to the reference are referred to as “sim” or “target” at times.

Metrics

Metrics are computed on paired daily values after joining by Date and dropping NA s. For each target (“sim”) vs reference (“obs”), the following are calculated. The set up for this begins in 01-metrics_helpers script and continues after 02_prepare_raw_data script in the 03_analysis_helpers 04-analysis_plotting scripts.

Core performance metrics and formulas (paired daily values)

Pearson correlation (r) (Pearson 1896) for overall co-variation (cor(obs, sim) ).

\[r = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})} {\sqrt{\sum_{i=1}^{n} (x_i - \bar{x})^2 \; \sum_{i=1}^{n} (y_i - \bar{y})^2}} \]Pearson correlation coefficient between the reference series (obs; Airport_1770) and the target dataset (sim). Here, (x_i) denotes the reference value on day (i), (y_i) denotes the target value on day (i), and (\bar{x}), (\bar{y}) are their respective means over the overlap period (after removing missing values). This measures how well the two sources move together.

Linear regression: Slope and intercept (scaling and offset bias) lm(sim ~obs)

\[ y_i = \beta_0 + \beta_1 x_i + \epsilon_i \]

Linear regression model used to estimate agreement between the reference series (obs; Airport_1770) and each target dataset (sim). Here, (x_i) is the reference value on day (i), (y_i) is the target value on day (i), (\beta_0) is the intercept, (\beta_1) is the slope, and (\epsilon_i) is the residual error (using paired, non‑missing daily values).

MAE: Mean absolute error mean(|sim-obs|).

\[ \mathrm{MAE} = \frac{1}{n}\sum_{i=1}^{n}\left|\,y_i - x_i\,\right| \]Mean Absolute Error (MAE) between the reference series (obs; (x_i)) and the target series (sim; (y_i)) over the overlapping days. Here (n) is the number of paired, non‑missing daily observations. This shows the average magnitude of errors between pairs of observations. Where the smaller the MAE, the closer the agreement between measurments..

RMSE: Root mean square error sqrt(mean((sin-obs)^2)).

\[ \mathrm{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}\left(y_i - x_i\right)^2} \]

Root Mean Square Error (RMSE) between the reference series (obs; (x_i)) and the target series (sim; (y_i)) over the overlapping days. Here (n) is the number of paired, non‑missing daily observations.

Lin’s concordance correlation coefficient (CCC) to combine accuracy and precision (L. I.-K. Lin 1989). Values were interpreted under the McBride 2005 scale, where; > 0.99: Almost perfect, 0.95–0.99: Substantial, 0.90–0.95: Moderate and < 0.90: Poor strength of agreement (Rekasi 2024). CCC penalises both bias and mismatched variance, it is correlation dependent. It assesses how well two sets of observations fall on the lane of perfect agreement (y=x).

\[ [\text{CCC} = \frac{2 r \sigma_{obs} \sigma_{sim}}{\sigma_{obs}^2 + \sigma_{sim}^2 + (\mu_{obs} - \mu_{sim})^2}] \]

Bias mean(sim - obs), and relative bias if mean obs is non-zero. Bias can tell us the direction of a systematic error, e.g. how far the dataset has shifted above or below the reference on average.

Relative bias: Give context to the bias. Expressed as percent (%) of mean observed, 100xbias/mean(obs). When considering light and heavy rain the relative bias can show the importance. For example a bias of +1mm/day is huge when tracking light rain only, but trivial if looking for heavy rain events.

Event performance with reference defined subsets

Wet‑day precipitation metrics use Precip_mm > 1 mm in the reference (set with wet_threshold_mm = 1), to assess performance on “relevant” raining days. Wet‑day metrics were applied only when the reference precipitation exceeded 1 mm (Airport_1770), to focus on “notable” rain events.

Windy‑day metrics used the top 10% of the reference dataset wind speeds (windy_top_pct = 0.10), defined by the the 90th percentile of the reference (e.g Airport) wind during the overlap period, metrics were computed only on those high‑wind days for the windy_day column. (A fixed threshold of 10 m/s is coded for as an alternative but not used)

For example: ERA5 vs Airport for precipitation, where airport precipitation= reference “obs” series ERA5 precipitation = source we are evaluating “sim”. mertrics_all uses every day where both datasets have values (after NA removal) (baseline overall performance) metrics_wet uses only days where the reference has “meaningful rain”:if threshold_mm = 1, it keeps days where airport precipitation mm > 1. This answers: “How well does ERA5 match rainy days?”(which usually matters more for modelling than dry days). metrics_windy uses only days where the reference indictes high wind : if threshold_ms = 10, it keeps days where airport wind speed m/s >=10. This answers: “How well does ERA5 match wind-storm days?” which matters because correlation may look high but perform poorly in seasonal extremes, wind and rain heavy events.

Event‑agreement statistics (Jolliffe 2003) for precipitation (threshold ≥ 1 mm/day, as set in the event‑skill section) include:

Hits (H) (ref event & target event), both obs and sim say “event”

Misses (M) (ref event & target no event), obs says “event” but sim does not

False alarms (FA), sim says “event” but obs does not

Correct negatives (CN), both say “no event”

POD (Probability of Detection) = H / (H + M)
“Of the true events, how many did we catch?”

FAR (False Alarm Ratio) = FA / (H + FA)
“Of the predicted events, how many were false?”

CSI (Critical Success Index) = H / (H + M + FA)
“Overall event accuracy, ignoring correct negatives.” >0.7 = strong, 0.4-0.7 = Moderate ,0.4 = weak

Bias score = (H + FA) / (H + M)
“Do we predict too many events (>1) or too few (<1)?”

These quantify detection performance separately from continuous error metrics.

Plotting and Diagnostics

Plots are generated from overlap‑aligned daily data to ensure consistent comparisons. Where plotting code avalable: ref_df: reference dataset (Airport_1770) in daily format. targets_list: list of target datasets (ERA5, Buoy, Town_40177, VCS_On). var: variable name string, e.g., "Temp_C", "Wind_Spd_ms", "Precip_mm". make_long_overlap(): helper that aligns datasets to overlapping dates and returns a long table with Date, value, and source. target_colors: named palette used consistently across plots

Show example code: Plotting set up

library(tidyverse)
library(slider)
library(plotly)

# --- Source helpers (must exist relative to project root) ---
# run source below if not using quarto file
source("scripts/01_metrics_helpers.R")
source("scripts/03_analysis_helpers.R")

# -----------------------------
# Load processed daily data
# -----------------------------
era5     <- read_csv("data/processed/rotorua_era5_daily.csv", show_col_types = FALSE)
buoy     <- read_csv("data/processed/rotorua_buoy_daily.csv", show_col_types = FALSE)
ap1770   <- read_csv("data/processed/rotorua_airport_1770_daily.csv", show_col_types = FALSE)
twn40177 <- read_csv("data/processed/rotorua_town_40177_daily.csv", show_col_types = FALSE)
vcs_on   <- read_csv("data/processed/rotorua_vcs_on_daily.csv", show_col_types = FALSE)

#----------------------------------------
# Creating additional functions for plotting and later use in quarto
#----------------------------------------

# Enforce Date type (prevents join issues)
to_date <- function(df) { df$Date <- as.Date(df$Date); df }
era5     <- to_date(era5)
buoy     <- to_date(buoy)
ap1770   <- to_date(ap1770)
twn40177 <- to_date(twn40177)
vcs_on   <- to_date(vcs_on)
# -----------------------------
# Reference + targets
# -----------------------------
ref_df <- ap1770

targets_list <- list(
  ERA5       = era5,
  Buoy       = buoy,
  Town_40177 = twn40177,
  VCS_On     = vcs_on
)

target_colors <- c(
  Airport_1770 = "black",
  ERA5         = "#d95f02",
  Buoy         = "#7570b3",
  Town_40177   = "#f0c400",
  VCS_On       = "#1b9e77"
)

vars <- c("Temp_C", "Precip_mm", "Wind_Spd_ms", "RadSWD_Wm2")

# -----------------------------
# Metrics table (ready for Quarto)
# -----------------------------
metrics_all <- purrr::imap_dfr(
  targets_list,
  ~ metrics_vs_ref(
    ref_df      = ref_df,
    target_df   = .x,
    target_name = .y,
    vars = vars,
    wet_threshold_mm = 1,
    windy_top_pct    = 0.10
  )
)

#print(metrics_all)

# ============================================================
# Plot helpers
# ============================================================

# -----------------------------
# Helper: overlap-aligned long table
# -----------------------------
# Why: comparisons must be on the SAME dates where the reference exists (honest overlap).
make_long_overlap <- function(ref_df, targets_list, var, ref_name = "Airport_1770") {

  ref_keep <- ref_df |>
    select(Date, value = all_of(var)) |>
    mutate(source = ref_name)

  targets_keep <- purrr::imap_dfr(
    targets_list,
    ~ .x |>
      select(Date, value = all_of(var)) |>
      mutate(source = .y)
  )

  df <- bind_rows(ref_keep, targets_keep) |>
    drop_na(value)

  ref_dates <- ref_keep |>
    drop_na(value) |>
    distinct(Date)

  df |> semi_join(ref_dates, by = "Date")
}

Time‑series line plots

for Temp_C, Wind_Spd_ms, and RadSWD_Wm2 (faceted by dataset).

Show example code: Time series plots

plot_timeseries_line <- function(ref_df, targets_list, var, ref_name = "Airport_1770") {
  df <- make_long_overlap(ref_df, targets_list, var, ref_name = ref_name)

  ggplot(df, aes(Date, value, color = source)) +
    geom_line(alpha = 0.85) +
    facet_wrap(~ source, ncol = 1, scales = "fixed") +
    scale_color_manual(values = target_colors) +
    labs(
      title = paste0(var, " over time (aligned to reference overlap)"),
      x = NULL, y = var
    ) +
    theme_bw() +
    theme(legend.position = "none")
}

Precipitation hydrographs

Daily hydrograph and 7-day-rolling totals

Show example code: Precipitation

# -----------------------------
# Precipitation plot mode
# display = "bars"    -> daily totals (mm/day)
# display = "rolling" -> rolling totals (mm/window_days)
# -----------------------------
plot_precip_hydrograph <- function(ref_df,
                                   targets_list,
                                   ref_name = "Airport_1770",
                                   display = c("bars", "rolling"),
                                   window_days = 7,
                                   show_7day_sum = NULL) {

  # Backward compatibility with older calls using show_7day_sum.  # changed at a later date  old code called 7day_sam new one calls "display = bars or rolling"
  if (!is.null(show_7day_sum)) {
    display <- if (isTRUE(show_7day_sum)) "rolling" else "bars"
  } else {
    display <- match.arg(display)
  }

  df <- make_long_overlap(ref_df, targets_list, "Precip_mm", ref_name = ref_name)

  p <- ggplot() +
    facet_wrap(~ source, ncol = 1, scales = "fixed") +
    theme_bw() +
    theme(legend.position = "none")

  if (display == "bars") {
    p <- p +
      geom_col(
        data = df,
        aes(Date, value, fill = source),
        alpha = 0.55
      ) +
      scale_fill_manual(values = target_colors) +
      labs(
        title = "Precipitation hydrograph (daily totals)",
        x = NULL,
        y = "Precip (mm/day)"
      )
  } else {
    df_roll <- df |>
      arrange(source, Date) |>
      group_by(source) |>
      mutate(
        roll_sum = slider::slide_dbl(
          value, ~ sum(.x, na.rm = TRUE),
          .before = window_days - 1,
          .complete = TRUE
        )
      ) |>
      ungroup()

    p <- p +
      geom_line(
        data = df_roll,
        aes(Date, roll_sum, color = source),
        linewidth = 0.5,
        alpha = 0.95
      ) +
      scale_color_manual(values = target_colors) +
      labs(
        title = paste0("Precipitation hydrograph (", window_days, "-day rolling totals)"),
        x = NULL,
        y = paste0("Precip (mm/", window_days, " days)")
      )
  }

  p
}

Distribution plots:

geom_density kernel density overlays for temperature and wind; zero‑inflated precipitation histograms for all days; wet‑day precipitation histograms (ref > 1 mm), (Scott 2015)

Show example function: plot_distribution()

plot_distribution <- function(ref_df, targets_list, var, ref_name = "Airport_1770") {
df <- make_long_overlap(ref_df, targets_list, var, ref_name = ref_name)

  if (var == "Precip_mm") {
    ggplot(df, aes(value, fill = source)) +
      geom_histogram(bins = 40, alpha = 0.5, position = "identity") +
      scale_fill_manual(values = target_colors) +
      labs(
        title = "Precip distribution (all days; zero-inflated)",
        x = "Precip (mm/day)", y = "Count"
      ) +
      theme_bw()
  } else {
    ggplot(df, aes(value, color = source)) +
      geom_density(linewidth = 1) +
      scale_color_manual(values = target_colors) +
      labs(
        title = paste0(var, " density (overlap with reference)"),
        x = var, y = "Density"
      ) +
      theme_bw()
  }
}

plot_precip_wetday_distribution <- function(ref_df, targets_list, threshold_mm = 1, ref_name = "Airport_1770") {
  df <- make_long_overlap(ref_df, targets_list, "Precip_mm", ref_name = ref_name)

  # Wet days defined using reference only (critical for fair event evaluation)
  ref_wet_dates <- ref_df |>
    select(Date, Precip_mm) |>
    drop_na(Precip_mm) |>
    filter(Precip_mm > threshold_mm) |>
    distinct(Date)

  df_wet <- df |> semi_join(ref_wet_dates, by = "Date")

  ggplot(df_wet, aes(value, fill = source)) +
    geom_histogram(bins = 40, alpha = 0.5, position = "identity") +
    scale_fill_manual(values = target_colors) +
    facet_wrap(~ source, ncol = 1, scales = "fixed") +
    labs(
      title = paste0("Precip distribution on wet days only (ref > ", threshold_mm, " mm)"),
      x = "Precip (mm/day)", y = "Count"
    ) +
    theme_bw()+
    theme(legend.position = "none")
}

Cumulative frequency (CDF) plots

For precipitation.

Show example function: plot_distribution()

plot_precip_cdf <- function(ref_df, targets_list, ref_name = "Airport_1770") {
  df <- make_long_overlap(ref_df, targets_list, "Precip_mm", ref_name = ref_name)

  ggplot(df, aes(value, color = source)) +
    stat_ecdf(geom = "step", pad = FALSE, linewidth = 1) +
    scale_color_manual(values = target_colors) +
    scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
    labs(
      title = "Cumulative frequency of daily precipitation",
      x = "Precip (mm/day)",
      y = "Cumulative probability"
    ) +
    theme_bw()
}

Scatter plots vs reference

With 1:1 line and fitted regression, annotated with n, R, RMSE, and bias.

Show example code: Scatter plots

plot_scatter_vs_ref_single <- function(ref_df, target_df, target_name, var) {
  joined <- inner_join(
    ref_df |> select(Date, ref = all_of(var)),
    target_df |> select(Date, tgt = all_of(var)),
    by = "Date"
  ) |> drop_na(ref, tgt)
  
  ggplot(joined, aes(ref, tgt)) +
    geom_point(alpha = 0.35) +
    geom_abline(slope = 1, intercept = 0, linetype = "dashed") +
    geom_smooth(method = "lm", se = TRUE) +
    labs(
      title = paste0(var, ": ", target_name, " vs Airport_1770"),
      x = "Reference (Airport_1770)",
      y = paste0(target_name)
    ) +
    theme_bw()
}
# -----------------------------
plot_scatter_faceted <- function(ref_df, targets_list, var, ref_name = "Airport_1770", ncol = 2) {

  paired <- purrr::imap_dfr(
    targets_list,
    ~ inner_join(
      ref_df |> select(Date, ref = all_of(var)),
      .x     |> select(Date, tgt = all_of(var)),
      by = "Date"
    ) |>
      drop_na(ref, tgt) |>
      mutate(source = .y)
  )

  stats_df <- paired |>
    group_by(source) |>
    summarise(
      n_val    = n(),
      cor_val  = if (n_val > 1) cor(ref, tgt) else NA_real_,
      rmse_val = sqrt(mean((tgt - ref)^2)),
      bias_val = mean(tgt - ref),
      label = paste0(
        "n = ", n_val,
        "\nR = ", ifelse(is.na(cor_val), "NA", sprintf("%.3f", cor_val)),
        "\nRMSE = ", sprintf("%.3f", rmse_val),
        "\nBias = ", sprintf("%.3f", bias_val)
      ),
      .groups = "drop"
    )

  ggplot(paired, aes(ref, tgt)) +
    geom_point(aes(color = source), alpha = 0.35) +
    geom_abline(slope = 1, intercept = 0, linetype = "dashed") +
    geom_smooth(method = "lm", se = TRUE) +
    geom_text(
      data = stats_df,
      aes(x = Inf, y = -Inf, label = label),
      inherit.aes = FALSE,
      hjust = 1.1, vjust = -0.2,
      color = "black" # fixed annotation color
    ) +
    facet_wrap(~ source, ncol = ncol, scales = "fixed") +
    scale_color_manual(values = target_colors) +
    coord_equal() +
    labs(
      title = paste0(var, ": datasets vs ", ref_name),
      x = paste0("Reference (", ref_name, ")"),
      y = paste0("Target (", var, ")")
    ) +
    theme_bw() +
    theme(legend.position = "none")
}

Rolling diagnostics

30‑day window of bias, correlation, MAE, and RMSE to evaluate temporal drift.

Show example code: Rolling diagnostics

rolling_diagnostics <- function(ref_df, target_df, var, window_days = 30) {

  joined <- inner_join(
    ref_df    |> select(Date, ref = all_of(var)),
    target_df |> select(Date, tgt = all_of(var)),
    by = "Date"
  ) |>
    arrange(Date) |>
    drop_na(ref, tgt) |>
    mutate(err = tgt - ref)

  joined |>
    mutate(
      mae  = slider::slide_dbl(abs(err), mean, .before = window_days - 1, .complete = TRUE),
      rmse = slider::slide_dbl(err^2, ~ sqrt(mean(.x)), .before = window_days - 1, .complete = TRUE),
      bias = slider::slide_dbl(err, mean, .before = window_days - 1, .complete = TRUE),
      cor  = slider::slide2_dbl(
        ref, tgt,
        ~ if (sum(is.finite(.x) & is.finite(.y)) >= 3) cor(.x, .y, use = "complete.obs") else NA_real_,
        .before = window_days - 1,
        .complete = TRUE
      )
    ) |>
    select(Date, mae, rmse, bias, cor)
}

plot_rolling_diagnostics <- function(ref_df, target_df, target_name, var, window_days = 30) {

  df <- rolling_diagnostics(ref_df, target_df, var, window_days) |>
    pivot_longer(c(mae, rmse, bias, cor), names_to = "metric", values_to = "value")

  ggplot(df, aes(Date, value)) +
    geom_line() +
    facet_wrap(~ metric, scales = "free_y", ncol = 1) +
    labs(
      title = paste0("Rolling ", window_days, "-day diagnostics: ", target_name, " vs Airport_1770 (", var, ")"),
      x = NULL, y = NULL
    ) +
    theme_bw()
}

Monthly climatology (mean ± IQR)

For temperature, wind, and precipitation.

Show example function: make_monthly_climatology

make_monthly_climatology <- function(ref_df, targets_list, var,
                                     ref_name = "Airport_1770", ribbon = 0.15) {
  df <- make_long_overlap(ref_df, targets_list, var, ref_name = ref_name) |>
    mutate(
      month = month(Date),
      month_lab = factor(month.abb[month], levels = month.abb)
    )

  df |>
    group_by(source, month, month_lab) |>
    summarise(
      n = n(),
      mean = mean(value, na.rm = TRUE),
      q25  = quantile(value, 0.25, na.rm = TRUE, type = 7),
      q75  = quantile(value, 0.75, na.rm = TRUE, type = 7),
      .groups = "drop"
    )
}

plot_monthly_climatology <- function(ref_df, targets_list, var,
                                     ref_name = "Airport_1770",
                                     ribbon_alpha = 0.15) {

  clim <- make_monthly_climatology(ref_df, targets_list, var, ref_name)

  # Make one consistent palette including the reference
  pal <- c(setNames("black", ref_name), target_colors)

  ggplot(clim, aes(month_lab, mean, group = source, color = source)) +
    # ribbon uses fill but we HIDE its legend so only one legend remains
    geom_ribbon(aes(ymin = q25, ymax = q75, fill = source),
                alpha = ribbon_alpha, color = NA, show.legend = FALSE) +
    geom_line(linewidth = 0.9) +
    geom_point(size = 2) +
    scale_color_manual(values = pal) +
    scale_fill_manual(values = pal) +
    labs(
      title = paste0("Monthly climatology: ", var, " (mean +/- IQR)"),
      x = NULL, y = var,
      color = "Dataset"
    ) +
    theme_bw() +
    theme(legend.position = "right")
}

Seasonal analysis

Set warm and cool periods and seasonal monthly bins

Show example code: Seasonal/climatology selection

season_defs_default <- list(
  summer = c(12, 1, 2), # December, January, Feb
  autumn = c(3, 4, 5), # March, April, May
  winter = c(6, 7, 8), #Jaune, July, August
  spring = c(9, 10, 11) # Sep, Oct, Nov
)

season_defs_warmcool <- list(
  warm = c(11, 12, 1, 2, 3, 4),    #Nov, Dec, Jan, Feb, March, April
  cool = c(5, 6, 7, 8, 9, 10)      #May, June, July, Aug, Sep, Oct
)

assign_season <- function(date, season_defs = season_defs_default) {
  m <- month(date)
  out <- rep(NA_character_, length(m))
  for (nm in names(season_defs)) {
    out[m %in% season_defs[[nm]]] <- nm
  }
  factor(out, levels = names(season_defs))
}

Seasonal bias plots

Target − reference by seasons.

Show example code: Searonal bias

plot_seasonal_bias <- function(ref_df, targets_list, var,
                              season_defs = season_defs_default,
                              ref_name = "Airport_1770") {

  paired <- imap_dfr(
    targets_list,
    ~ inner_join(
      ref_df |> select(Date, ref = all_of(var)),
      .x     |> select(Date, tgt = all_of(var)),
      by = "Date"
    ) |>
      drop_na(ref, tgt) |>
      mutate(target = .y, season = assign_season(Date, season_defs),
             bias = tgt - ref)
  )

  summ <- paired |>
    group_by(target, season) |>
    summarise(
      n = n(),
      bias_mean = mean(bias, na.rm = TRUE),
      bias_q25  = quantile(bias, 0.25, na.rm = TRUE),
      bias_q75  = quantile(bias, 0.75, na.rm = TRUE),
      .groups = "drop"
    )

  ggplot(summ, aes(season, bias_mean, fill = target)) +
    geom_col(position = position_dodge(width = 0.8), width = 0.7) +
    geom_errorbar(
      aes(ymin = bias_q25, ymax = bias_q75),
      position = position_dodge(width = 0.8),
      width = 0.2
    ) +
    scale_fill_manual(values = target_colors) +
    labs(
      title = paste0("Seasonal bias: ", var, " (target - reference)"),
      x = NULL, y = "Bias", fill = "Dataset"
    ) +
    theme_bw()
}

Seasonal scatter panels

For wind and precipitation, showing seasonal structure in agreement.

Show example code: Seasonal scatter

plot_seasonal_scatter <- function(ref_df, targets_list, var,
                                 season_defs = season_defs_default,
                                 ref_name = "Airport_1770") {

  paired <- imap_dfr(
    targets_list,
    ~ inner_join(
      ref_df |> select(Date, ref = all_of(var)),
      .x     |> select(Date, tgt = all_of(var)),
      by = "Date"
    ) |>
      drop_na(ref, tgt) |>
      mutate(target = .y, season = assign_season(Date, season_defs))
  )

  ggplot(paired, aes(ref, tgt)) +
    geom_point(alpha = 0.25) +
    geom_abline(slope = 1, intercept = 0, linetype = "dashed") +
    geom_smooth(method = "lm", se = FALSE) +
    facet_grid(season ~ target) +
    coord_equal() +
    labs(
      title = paste0("Seasonal scatter: ", var, " (datasets vs reference)"),
      x = paste0("Reference (", ref_name, ")"),
      y = "Target"
    ) +
    theme_bw() +
    theme(legend.position = "none")
}

Event‑agreement plots

For precipitation: stacked outcome counts (hits/misses/false alarms/correct negatives), skill score bars (POD, FAR, CSI, bias score), and false‑alarm magnitude histograms (log1p scale).

Show example code: Event aggreement

event_skill <- function(ref_df, target_df, var, threshold,
                        direction = c(">", ">=", "<", "<=")) {
  direction <- match.arg(direction)

  joined <- inner_join(
    ref_df    |> select(Date, ref = all_of(var)),
    target_df |> select(Date, tgt = all_of(var)),
    by = "Date"
  ) |>
    drop_na(ref, tgt)

  ref_event <- switch(
    direction,
    ">"  = joined$ref >  threshold,
    ">=" = joined$ref >= threshold,
    "<"  = joined$ref <  threshold,
    "<=" = joined$ref <= threshold
  )

  tgt_event <- switch(
    direction,
    ">"  = joined$tgt >  threshold,
    ">=" = joined$tgt >= threshold,
    "<"  = joined$tgt <  threshold,
    "<=" = joined$tgt <= threshold
  )

  hits  <- sum(ref_event & tgt_event, na.rm = TRUE)
  miss  <- sum(ref_event & !tgt_event, na.rm = TRUE)
  fa    <- sum(!ref_event & tgt_event, na.rm = TRUE)
  cn    <- sum(!ref_event & !tgt_event, na.rm = TRUE)

  pod <- if ((hits + miss) > 0) hits / (hits + miss) else NA_real_
  far <- if ((hits + fa)   > 0) fa   / (hits + fa)   else NA_real_
  csi <- if ((hits + miss + fa) > 0) hits / (hits + miss + fa) else NA_real_
  bias_score <- if ((hits + miss) > 0) (hits + fa) / (hits + miss) else NA_real_

  tibble(
    n = nrow(joined),
    hits = hits, misses = miss, false_alarms = fa, correct_neg = cn,
    POD = pod, FAR = far, CSI = csi, bias_score = bias_score
  )
}

event_skill_all_targets <- function(ref_df, targets_list, var, threshold,
                                   direction = ">=",
                                   ref_name = "Airport_1770") {
  imap_dfr(targets_list, ~ {
    event_skill(ref_df, .x, var = var, threshold = threshold, direction = direction) |>
      mutate(target = .y, var = var, threshold = threshold, ref = ref_name, .before = 1)
  })
}

# --- Plot: stacked counts of event outcomes per target ---
plot_event_outcomes <- function(skill_tbl, title = NULL) {
  long <- skill_tbl |>
    select(target, hits, misses, false_alarms, correct_neg) |>
    pivot_longer(-target, names_to = "outcome", values_to = "count")

  ggplot(long, aes(target, count, fill = outcome)) +
    geom_col() +
    coord_flip() +
    labs(
      title = title %||% "Event agreement outcomes",
      x = NULL, y = "Count (days)", fill = NULL
    ) +
    theme_bw()
}

# --- Plot: POD/FAR/CSI per target (quick skill comparison) ---
plot_event_skill_scores <- function(skill_tbl, title = NULL) {
  long <- skill_tbl |>
    select(target, POD, FAR, CSI, bias_score) |>
    pivot_longer(-target, names_to = "metric", values_to = "value")

  ggplot(long, aes(target, value)) +
    geom_col() +
    facet_wrap(~ metric, ncol = 2, scales = "free_y") +
    coord_flip() +
    labs(
      title = title %||% "Event skill scores",
      x = NULL, y = NULL
    ) +
    theme_bw()
}

# --- Magnitude when disagreement occurs (precip example) ---
# "Ref dry but target wet": how much rain do they report?
plot_false_alarm_magnitude <- function(ref_df, targets_list,
                                      threshold_mm = 1,
                                      ref_name = "Airport_1770") {

  # Build paired dataset
  paired <- imap_dfr(
    targets_list,
    ~ inner_join(
      ref_df |> select(Date, ref = Precip_mm),
      .x     |> select(Date, tgt = Precip_mm),
      by = "Date"
    ) |>
      drop_na(ref, tgt) |>
      mutate(target = .y)
  )

  fa <- paired |>
    filter(ref <= threshold_mm, tgt > threshold_mm)

  ggplot(fa, aes(tgt, fill = target)) +
    geom_histogram(bins = 40, alpha = 0.6, position = "identity") +
    scale_x_continuous(trans = "log1p") +
    scale_fill_manual(values = target_colors) +
    labs(
      title = paste0("False alarms magnitude: ref ≤ ", threshold_mm, " mm but target > ", threshold_mm, " mm"),
      x = "Target precip (mm/day) [log1p]", y = "Count", fill = "Dataset"
    ) +
    theme_bw()
}

# --- Example calls (precip threshold default = 0.1 mm; easy to change) ---
precip_event_threshold <- 1  # change to 1 or 5 if you want sensitivity tests later

skill_precip <- event_skill_all_targets(
  ref_df, targets_list,
  var = "Precip_mm",
  threshold = precip_event_threshold,
  direction = ">=")

Data availability / overlap

Only dates present in both the reference and the compared dataset are used. Time series and distribution plots use the reference dates as the overlap anchor. Coverage tables report first/last date and number of overlapping days for transparency.

Rolling diagnostics logic

To detect drift through time, moving-window diagnostics (window = 30 days by default) track MAE, RMSE, bias, and correlation. The rolling window is trailing, uses only complete windows, and requires a minimum of paired values for correlation. These plots help identify regime shifts, seasonal dependence, or gradual timing offsets.

Software and packages

Analyses were performed in R (R Core Team 2024). Data wrangling and visualization use tidyverse packages (dplyr, tidyr, ggplot2, readr, purrr, tibble, lubridate) (Wickham et al. 2019). Spatial processing relies on the sf framework (Pebesma 2026), with leaflet (Cheng et al. 2025), tmap (Tennekes 2025), osmdata (Maspons et al. 2025), and rnaturalearth (Massicotte and South 2026) used for mapping. Interactive figures are produced with plotly (Sievert et al. 2025) and tables with reactable (G. Lin 2025) and gt (Iannone et al. 2026). Reproducibility is managed with renv (Ushey and Wickham 2025), and report generation uses Quarto with knitr/rmarkdown and htmltools (Cheng et al. 2024); here is used for stable file paths.

Limitations

Spatial representativeness differs: point sensors (buoy, airport) capture local effects, while gridded/reanalysis products smooth extremes. Differences in wind measurement points/heights result in dataset differences, while precipitation measurements are subject to topographical and equipment differences. Residual timestamp misalignment could affect daily aggregates despite UTC standardisation. Zero inflated precipitation can inflate r wet-day metrics mitigate, but do not eliminate this effect. Findings depend on the overlapping period; results may change with different seasons or longer records. Wet/windy thresholds are pragmatic; sensitivity checks are recommended if thresholds influence decisions. If there are not enough overlapping dates or complete missing variables plots will not render. For example Radiation for rotorua went through the same methods process as the rest of the variables but a lack over overlap resulted in no diagnostic execution. Too low for statistical evaluation.

Although ground based meterological datasets are usually considered the best reference, they are often plagued with gaps and at times succumb to the limitations of routine record keeping maintance.

Future users should adapt thresholds, reference choice, and aggregation rules to their lake, documenting any deviations.

References

Cheng, Joe, Barret Schloerke, Bhaskar Karambelkar, Yihui Xie, and Garrick Aden-Buie. 2025. Leaflet: Create Interactive Web Maps with the JavaScript Leaflet Library. https://rstudio.github.io/leaflet/.

Cheng, Joe, Carson Sievert, Barret Schloerke, Winston Chang, Yihui Xie, and Jeff Allen. 2024. Htmltools: Tools for HTML. https://github.com/rstudio/htmltools.

Iannone, Richard, Joe Cheng, Barret Schloerke, Shannon Haughton, Ellis Hughes, Alexandra Lauer, Romain François, JooYoung Seo, Ken Brevoort, and Olivier Roy. 2026. Gt: Easily Create Presentation-Ready Display Tables. https://gt.rstudio.com.

Jolliffe, Ian T., ed. 2003. Forecast Verification: A Practitioner’s Guide in Atmospheric Science. Repr. Chichester: Wiley.

Lin, Greg. 2025. Reactable: Interactive Data Tables for r. https://glin.github.io/reactable/.

Lin, Lawrence I-Kuei. 1989. “A Concordance Correlation Coefficient to Evaluate Reproducibility.” Biometrics 45 (1): 255–68. https://doi.org/10.2307/2532051.

Maspons, Joan, Mark Padgham, Bob Rudis, Robin Lovelace, and Maëlle Salmon. 2025. Osmdata: Import OpenStreetMap Data as Simple Features or Spatial Objects. https://docs.ropensci.org/osmdata/.

Massicotte, Philippe, and Andy South. 2026. Rnaturalearth: World Map Data from Natural Earth. https://docs.ropensci.org/rnaturalearth/.

Pearson, Karl. 1896. “Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia.” Philosophical Transactions of the Royal Society of London. Series A 187: 253–318.

Pebesma, Edzer. 2026. Sf: Simple Features for r. https://r-spatial.github.io/sf/.

R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Rekasi, Andrea. 2024. “Step-by-Step Guide to Using the Concordance Correlation Coefficient | Python-Bloggers.” https://python-bloggers.com/2024/10/step-by-step-guide-to-using-the-concordance-correlation-coefficient/.

Scott, David W. 2015. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons.

Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2025. Plotly: Create Interactive Web Graphics via Plotly.js. https://plotly-r.com.

Tennekes, Martijn. 2025. Tmap: Thematic Maps. https://github.com/r-tmap/tmap.

Ushey, Kevin, and Hadley Wickham. 2025. Renv: Project Environments. https://rstudio.github.io/renv/.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.