Reproducibility / How to reuse
Use this guide to rerun the Rotorua workflow or adapt it to a new lake. The project flow is:
- Prepare raw data into daily, standardized CSVs (
scripts/02_prepare_raw_data.R). - Load files into (
scripts/03_analysis_helpers.R, andscripts/04)analysis_plotting.R). - Load source scripts to QMDs, compute metrics, and render plots.
- Render the site with Quarto.
Additionally, just use scripts/05_analysis_rotorua_plots.R to print plots without needing to render qmds.
This page separates required changes to scripts (data prep and helper code) from changes to QMDs/YAML (report configuration).
Setup (once per machine)
- Install R (>= 4.4) and Quarto.
- From the project root, run
scripts/00_project_setup.Rto restorerenvand required GitHub packages. - If
renv::restore()fails (offline), set a reachable CRAN mirror withoptions(repos = ...)and try again.
Changes for a new lake: scripts
Make these edits in scripts if you want to re-use the pipeline for another lake. These changes affect data creation and helper defaults.
scripts/02_prepare_raw_data.R
- Update the raw file paths under
data/raw/to your new lake folders and filenames. - Update station IDs in file paths and any column name mappings that differ.
- Keep the UTC conversion and daily aggregation rules, using means for state variables (Temp_C, Wind_Spd_ms, RadSWD_Wm2) and sums for flux variables (Precip_mm).
- Keep or update unit conversions, including wind knots to m/s if needed, sub-daily precip rates to daily totals, and radiation MJ/m^2 to W/m^2 when required.
- Update any bad-data filtering to match your lake (e.g., remove known corrupt periods).
- Update the output filenames at the end so they match your new lake ID (for example:
data/processed/<lake_id>_era5_daily.csv,data/processed/<lake_id>_vcs_on_daily.csv, plus station/buoy files as applicable).
scripts/03_analysis_helpers.R
- This file currently reads Rotorua processed data when sourced.
- If you are not keeping Rotorua files, update the file paths to your new lake or remove/guard those read lines so sourcing does not error.
- The metrics functions (
metrics_vs_ref) are reusable and do not need changes unless you change variable names.
scripts/04_analysis_plotting.R
- This file also reads Rotorua processed data when sourced.
- If you are not keeping Rotorua files, update those read paths or guard them.
- Update
ref_df,targets_list, andtarget_colorsdefaults if your reference/targets have different names. - Plot functions are reusable. They assume column names
Date,Temp_C,Precip_mm,Wind_Spd_ms,RadSWD_Wm2.
scripts/05_analysis_rotorua_plots.R
- Rotorua specific example. Update only if you want to run a stand alone script for a new lake. The QMDs do not depend on this file.
Changes for a new lake: QMD + YAML
These edits control the rendered report and are required even if you only update the scripts.
params.yml (optional single source of truth)
- This file contains
params:used by QMDs if you remove local params blocks. - If you keep local params inside each QMD (current setup), update them in each QMD instead.
QMD front
Update these in each of: index.qmd, Overview.qmd, Results.qmd, metrics_stats.qmd, Behaviour.qmd, rotorua.qmd:
lake_name,lat,lon,buffer_km,lake_idreferenceandtargetsvarsand thresholds (wet_threshold_mm,precip_event_threshold,windy_top_pct,windy_threshold_ms,window_days)- In
Overview.qmd, update any station/point coordinates and labels used for the map.
QMD file_map blocks
Each of the QMDs listed above contains a file_map <- list(...) block. Update those paths to your new processed files.
QMD plot calls
Some plots are hard-coded to specific variables (Temp, Wind, Precip). If you remove variables or change the reference name:
- Wrap plot calls with checks like
if ("Temp_C" %in% vars) ...to avoid errors. - Pass
ref_name = params$referenceinto plot functions if your reference is not Airport_1770.
**_quarto.yml**
- Add or remove QMD files in the
project.renderlist if you change page names. - The site outputs to
docs/by default.
Step-by-step: new lake run
- Place raw files under
data/raw/<your_lake>/.... - Edit
scripts/02_prepare_raw_data.Rfor your new lake and run it to createdata/processed/<lake_id>_*.csv. - Update the QMD params and
file_mapblocks to point at your new processed files. - Render the site:
quarto render.
Example: Lake Tutera (template)
Use this as a starting point for naming and params:
lake_name: "Lake Tutera"
lat: -39.134088
lon: 176.552553
buffer_km: 10
lake_id: "tutera"
reference: "Airport_XXXX"
targets: ["ERA5", "VCS_On"]
vars: ["Temp_C", "Precip_mm", "Wind_Spd_ms", "RadSWD_Wm2"]
wet_threshold_mm: 1
precip_event_threshold: 1
windy_top_pct: 0.10
windy_threshold_ms: 10
window_days: 30Example processed filenames:
data/processed/tutera_era5_daily.csvdata/processed/tutera_vcs_on_daily.csvdata/processed/tutera_airport_XXXX_daily.csv
Smaller analysis: two datasets only (ERA5 + VCS_On)
- Set
referenceto the dataset you want as the benchmark (e.g.,VCS_On). - Set
targets: ["ERA5"]. - In each QMD
file_map, keep only those two files (plus the reference file if it is separate). - Update any plot calls or titles that still say Airport_1770 to use
params$reference.
This will produce the same metrics, plots, and diagnostics, but only for the chosen pair.
Smaller analysis: one variable only
- Set
varsto a single variable (e.g.,["Wind_Spd_ms"]). - Ensure that variable exists in every processed dataset.
- In Results/Behaviour/Metrics QMDs, remove or guard plot chunks that reference other variables.
This keeps the workflow identical but limits output to one variable.