Spatial Data Fusion for Extreme Sea Levels
Source:vignettes/evfuse-tutorial.Rmd
evfuse-tutorial.RmdOverview
The evfuse package implements a two-stage frequentist
framework for fusing sparse observations and dense simulations for
spatial extreme value analysis (White et al., in preparation). While
developed for U.S. coastal sea levels (NOAA tidal gauges + ADCIRC
simulations), the framework applies to any setting with annual maxima
from multiple spatial data sources.
- Stage 1: Fit GEV distributions independently at each site.
- Stage 2: Fit a joint multivariate Gaussian process that models cross-source dependence via a coregionalization (LMC) structure.
- Kriging: Predict at new locations, drawing on information from all sources even when each source only observes a subset of the parameters.
- Return levels: Estimate r-year return levels with uncertainty via Monte Carlo simulation and/or the delta method.
Data
The package includes coast_data, a dataset of annual
maximum sea levels at 129 sites along the U.S. Gulf and Atlantic coasts:
29 NOAA tidal gauge stations and 100 ADCIRC numerical simulation
points.
Stage 1: GEV Fitting
Fit GEV(mu, sigma, xi) at each site via maximum likelihood. Parameters are stored as (mu, log sigma, xi) for unconstrained optimization.
stage1 <- fit_gev_all(coast_data)
sum(stage1$converged)
# [1] 129Bootstrap Measurement Uncertainty
Estimate the covariance of the Stage 1 MLEs via nonparametric block bootstrap (resampling years to preserve spatial dependence within each data source). Apply Wendland tapering to enforce sparsity.
D <- compute_distances(coast_data$sites)
bs <- bootstrap_W(coast_data, B = 500, seed = 42)
W_tap <- taper_W(bs$W_bs, D, lambda = 300)Stage 2: Joint Spatial Model
The joint model has p=6 dimensions (3 per source) with parameters:
- beta (6x1): Mean vector for (mu_NOAA, log_sigma_NOAA, xi_NOAA, mu_ADCIRC, log_sigma_ADCIRC, xi_ADCIRC).
- A (6x6 lower triangular): Cross-covariance factor such that the inter-parameter covariance at distance 0 is A A^T.
- rho (6x1): Range parameters for the 6 latent exponential correlation functions.
The observed parameter vector theta_hat ~ N(beta x 1_L, Sigma + W_tap), where each site contributes only its observed dimensions (NOAA: 1-3, ADCIRC: 4-6).
model <- fit_spatial_model(stage1, coast_data, W_tap, D,
control = list(maxit = 2000, trace = 0))
model$optim_result$convergence
# [1] 0Kriging and Return Levels
Predict at new locations. The kriging predictor fuses information from both NOAA and ADCIRC observations to produce NOAA-scale GEV parameters with reduced uncertainty.
# Predict at a few example coastal sites
new_sites <- data.frame(lon = c(-90.0, -81.5, -75.5), lat = c(30.0, 31.5, 36.8))
preds <- predict_krig(model, new_sites)
rl <- compute_return_levels(preds, r = 100)
# 100-year return levels at year-2000 reference conditions
rl$return_levelModel Validation: Leave-One-Out CV
The closed-form block LOO-CV (Rasmussen & Williams, 2006, eq. 5.12) evaluates predictive performance at each NOAA site without refitting.
loo <- loo_cv(model)
sum_loo <- loo_summary(loo, r = 100)
# Total log predictive density
sum_loo$total_lpd
# 100-year return level RMSE
sum_loo$rl_rmseCustom Data Sources
The source_params argument generalizes the framework
beyond NOAA/ADCIRC. Each entry maps a data source name to its parameter
indices in the joint model:
# Example: 3 sources, each observing 3 GEV parameters
dat <- load_data(df, source_params = list(
gauge = 1:3,
satellite = 4:6,
reanalysis = 7:9
))
# Fit a 9-dimensional joint model
model <- fit_spatial_model(stage1, dat, W_tap, D)References
White, B. N., Blanton, B., Luettich, R., & Smith, R. L. Fusing Sparse Observations and Dense Simulations for Spatial Extreme Value Analysis: Application to U.S. Coastal Sea Levels. arXiv preprint, 2026. arXiv:2603.03247
Russell, B. T., Risser, M. D., Smith, R. L., & Kunkel, K. E. (2020). Investigating the association between late spring Gulf of Mexico sea surface temperatures and U.S. Gulf Coast precipitation extremes with focus on Hurricane Harvey. Environmetrics, 31(2), e2595.
Rasmussen, C. E. & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press.