Reproducibility / How to reuse

Use this guide to rerun the Rotorua workflow or adapt it to a new lake. The project flow is:

Prepare raw data into daily, standardized CSVs (scripts/02_prepare_raw_data.R).
Load files into (scripts/03_analysis_helpers.R, and scripts/04)analysis_plotting.R).
Load source scripts to QMDs, compute metrics, and render plots.
Render the site with Quarto.

Additionally, just use scripts/05_analysis_rotorua_plots.R to print plots without needing to render qmds.

This page separates required changes to scripts (data prep and helper code) from changes to QMDs/YAML (report configuration).

Setup (once per machine)

Install R (>= 4.4) and Quarto.
From the project root, run scripts/00_project_setup.R to restore renv and required GitHub packages.
If renv::restore() fails (offline), set a reachable CRAN mirror with options(repos = ...) and try again.

Changes for a new lake: scripts

Make these edits in scripts if you want to re-use the pipeline for another lake. These changes affect data creation and helper defaults.

scripts/02_prepare_raw_data.R

Update the raw file paths under data/raw/ to your new lake folders and filenames.
Update station IDs in file paths and any column name mappings that differ.
Keep the UTC conversion and daily aggregation rules, using means for state variables (Temp_C, Wind_Spd_ms, RadSWD_Wm2) and sums for flux variables (Precip_mm).
Keep or update unit conversions, including wind knots to m/s if needed, sub-daily precip rates to daily totals, and radiation MJ/m^2 to W/m^2 when required.
Update any bad-data filtering to match your lake (e.g., remove known corrupt periods).
Update the output filenames at the end so they match your new lake ID (for example: data/processed/<lake_id>_era5_daily.csv, data/processed/<lake_id>_vcs_on_daily.csv, plus station/buoy files as applicable).

scripts/03_analysis_helpers.R

This file currently reads Rotorua processed data when sourced.
If you are not keeping Rotorua files, update the file paths to your new lake or remove/guard those read lines so sourcing does not error.
The metrics functions (metrics_vs_ref) are reusable and do not need changes unless you change variable names.

scripts/04_analysis_plotting.R

This file also reads Rotorua processed data when sourced.
If you are not keeping Rotorua files, update those read paths or guard them.
Update ref_df, targets_list, and target_colors defaults if your reference/targets have different names.
Plot functions are reusable. They assume column names Date, Temp_C, Precip_mm, Wind_Spd_ms, RadSWD_Wm2.

scripts/05_analysis_rotorua_plots.R

Rotorua specific example. Update only if you want to run a stand alone script for a new lake. The QMDs do not depend on this file.

Changes for a new lake: QMD + YAML

These edits control the rendered report and are required even if you only update the scripts.

params.yml (optional single source of truth)

This file contains params: used by QMDs if you remove local params blocks.
If you keep local params inside each QMD (current setup), update them in each QMD instead.

QMD front

Update these in each of: index.qmd, Overview.qmd, Results.qmd, metrics_stats.qmd, Behaviour.qmd, rotorua.qmd:

lake_name, lat, lon, buffer_km, lake_id
reference and targets
vars and thresholds (wet_threshold_mm, precip_event_threshold, windy_top_pct, windy_threshold_ms, window_days)
In Overview.qmd, update any station/point coordinates and labels used for the map.

QMD file_map blocks

Each of the QMDs listed above contains a file_map <- list(...) block. Update those paths to your new processed files.

QMD plot calls

Some plots are hard-coded to specific variables (Temp, Wind, Precip). If you remove variables or change the reference name:

Wrap plot calls with checks like if ("Temp_C" %in% vars) ... to avoid errors.
Pass ref_name = params$reference into plot functions if your reference is not Airport_1770.

**_quarto.yml**

Add or remove QMD files in the project.render list if you change page names.
The site outputs to docs/ by default.

Step-by-step: new lake run

Place raw files under data/raw/<your_lake>/....
Edit scripts/02_prepare_raw_data.R for your new lake and run it to create data/processed/<lake_id>_*.csv.
Update the QMD params and file_map blocks to point at your new processed files.
Render the site: quarto render.

Example: Lake Tutera (template)

Use this as a starting point for naming and params:

lake_name: "Lake Tutera"
lat: -39.134088
lon: 176.552553
buffer_km: 10
lake_id: "tutera"
reference: "Airport_XXXX"
targets: ["ERA5", "VCS_On"]
vars: ["Temp_C", "Precip_mm", "Wind_Spd_ms", "RadSWD_Wm2"]
wet_threshold_mm: 1
precip_event_threshold: 1
windy_top_pct: 0.10
windy_threshold_ms: 10
window_days: 30

Example processed filenames:

data/processed/tutera_era5_daily.csv
data/processed/tutera_vcs_on_daily.csv
data/processed/tutera_airport_XXXX_daily.csv

Smaller analysis: two datasets only (ERA5 + VCS_On)

Set reference to the dataset you want as the benchmark (e.g., VCS_On).
Set targets: ["ERA5"].
In each QMD file_map, keep only those two files (plus the reference file if it is separate).
Update any plot calls or titles that still say Airport_1770 to use params$reference.

This will produce the same metrics, plots, and diagnostics, but only for the chosen pair.

Smaller analysis: one variable only

Set vars to a single variable (e.g., ["Wind_Spd_ms"]).
Ensure that variable exists in every processed dataset.
In Results/Behaviour/Metrics QMDs, remove or guard plot chunks that reference other variables.

This keeps the workflow identical but limits output to one variable.