Spatial Data Fusion for Extreme Sea Levels • evfuse

Overview

The evfuse package implements a two-stage frequentist framework for fusing sparse observations and dense simulations for spatial extreme value analysis (White et al., in preparation). While developed for U.S. coastal sea levels (NOAA tidal gauges + ADCIRC simulations), the framework applies to any setting with annual maxima from multiple spatial data sources.

Stage 1: Fit GEV distributions independently at each site.
Stage 2: Fit a joint multivariate Gaussian process that models cross-source dependence via a coregionalization (LMC) structure.
Kriging: Predict at new locations, drawing on information from all sources even when each source only observes a subset of the parameters.
Return levels: Estimate r-year return levels with uncertainty via Monte Carlo simulation and/or the delta method.

Data

The package includes coast_data, a dataset of annual maximum sea levels at 129 sites along the U.S. Gulf and Atlantic coasts: 29 NOAA tidal gauge stations and 100 ADCIRC numerical simulation points.

library(evfuse)
data(coast_data)

coast_data$n_sites
# [1] 129

head(coast_data$sites)

Stage 1: GEV Fitting

Fit GEV(mu, sigma, xi) at each site via maximum likelihood. Parameters are stored as (mu, log sigma, xi) for unconstrained optimization.

stage1 <- fit_gev_all(coast_data)
sum(stage1$converged)
# [1] 129

Bootstrap Measurement Uncertainty

Estimate the covariance of the Stage 1 MLEs via nonparametric block bootstrap (resampling years to preserve spatial dependence within each data source). Apply Wendland tapering to enforce sparsity.

D <- compute_distances(coast_data$sites)
bs <- bootstrap_W(coast_data, B = 500, seed = 42)
W_tap <- taper_W(bs$W_bs, D, lambda = 300)

Stage 2: Joint Spatial Model

The joint model has p=6 dimensions (3 per source) with parameters:

beta (6x1): Mean vector for (mu_NOAA, log_sigma_NOAA, xi_NOAA, mu_ADCIRC, log_sigma_ADCIRC, xi_ADCIRC).
A (6x6 lower triangular): Cross-covariance factor such that the inter-parameter covariance at distance 0 is A A^T.
rho (6x1): Range parameters for the 6 latent exponential correlation functions.

The observed parameter vector theta_hat ~ N(beta x 1_L, Sigma + W_tap), where each site contributes only its observed dimensions (NOAA: 1-3, ADCIRC: 4-6).

model <- fit_spatial_model(stage1, coast_data, W_tap, D,
                           control = list(maxit = 2000, trace = 0))
model$optim_result$convergence
# [1] 0

Kriging and Return Levels

Predict at new locations. The kriging predictor fuses information from both NOAA and ADCIRC observations to produce NOAA-scale GEV parameters with reduced uncertainty.

# Predict at a few example coastal sites
new_sites <- data.frame(lon = c(-90.0, -81.5, -75.5), lat = c(30.0, 31.5, 36.8))
preds <- predict_krig(model, new_sites)
rl <- compute_return_levels(preds, r = 100)

# 100-year return levels at year-2000 reference conditions
rl$return_level

Model Validation: Leave-One-Out CV

The closed-form block LOO-CV (Rasmussen & Williams, 2006, eq. 5.12) evaluates predictive performance at each NOAA site without refitting.

loo <- loo_cv(model)
sum_loo <- loo_summary(loo, r = 100)

# Total log predictive density
sum_loo$total_lpd

# 100-year return level RMSE
sum_loo$rl_rmse

Custom Data Sources

The source_params argument generalizes the framework beyond NOAA/ADCIRC. Each entry maps a data source name to its parameter indices in the joint model:

# Example: 3 sources, each observing 3 GEV parameters
dat <- load_data(df, source_params = list(
  gauge    = 1:3,
  satellite = 4:6,
  reanalysis = 7:9
))

# Fit a 9-dimensional joint model
model <- fit_spatial_model(stage1, dat, W_tap, D)

References

White, B. N., Blanton, B., Luettich, R., & Smith, R. L. Fusing Sparse Observations and Dense Simulations for Spatial Extreme Value Analysis: Application to U.S. Coastal Sea Levels. arXiv preprint, 2026. arXiv:2603.03247

Russell, B. T., Risser, M. D., Smith, R. L., & Kunkel, K. E. (2020). Investigating the association between late spring Gulf of Mexico sea surface temperatures and U.S. Gulf Coast precipitation extremes with focus on Hurricane Harvey. Environmetrics, 31(2), e2595.

Rasmussen, C. E. & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press.