Skip to content

Nonparametric prediction intervals can be computed for fitted regression workflow objects using the CV+ conformal inference method described by Barber at al (2018).

Usage

int_conformal_cv(object, ...)

# S3 method for default
int_conformal_cv(object, ...)

# S3 method for resample_results
int_conformal_cv(object, ...)

# S3 method for tune_results
int_conformal_cv(object, parameters, ...)

Arguments

object

An object from a tidymodels resampling or tuning function such as tune::fit_resamples(), tune::tune_grid(), or similar. The object should have been produced in a way that the .extracts column contains the fitted workflow for each resample (see the Details below).

...

Not currently used.

parameters

An tibble of tuning parameter values that can be used to filter the predicted values before processing. This tibble should select a single set of hyper-parameter values from the tuning results. This is only required when a tuning object is passed to object.

Value

An object of class "int_conformal_cv" containing the information to create intervals. The predict() method is used to produce the intervals.

Details

This function implements the CV+ method found in Section 3 of Barber at al (2018). It uses the resampled model fits and their associated holdout residuals to make prediction intervals for regression models.

This function prepares the objects for the computations. The predict() method computes the intervals for new data.

This method was developed for V-fold cross-validation (no repeats). Interval coverage is unknown for any other resampling methods. The function will not stop the computations for other types of resamples, but we have no way of knowing whether the results are appropriate.

References

Rina Foygel Barber, Emmanuel J. Candès, Aaditya Ramdas, Ryan J. Tibshirani "Predictive inference with the jackknife+," The Annals of Statistics, 49(1), 486-507, 2021

Examples

library(workflows)
library(dplyr)
library(parsnip)
library(rsample)
library(tune)
library(modeldata)

set.seed(2)
sim_train <- sim_regression(200)
sim_new <- sim_regression(5) %>% select(-outcome)

sim_rs <- vfold_cv(sim_train)

# We'll use a neural network model
mlp_spec <-
  mlp(hidden_units = 5, penalty = 0.01) %>%
  set_mode("regression")

# Use a control function that saves the predictions as well as the models.
# Consider using the butcher package in the extracts function to have smaller
# object sizes

ctrl <- control_resamples(save_pred = TRUE, extract = I)

set.seed(3)
nnet_res <-
  mlp_spec %>%
  fit_resamples(outcome ~ ., resamples = sim_rs, control = ctrl)

nnet_int_obj <- int_conformal_cv(nnet_res)
nnet_int_obj
#> Conformal inference via CV+
#> preprocessor: formula 
#> model: mlp (engine = nnet) 
#> number of models: 10 
#> training set size: 200 
#> 
#> Use `predict(object, new_data, level)` to compute prediction intervals

predict(nnet_int_obj, sim_new)
#> # A tibble: 5 × 3
#>   .pred_lower   .pred .pred_upper
#>         <dbl>   <dbl>       <dbl>
#> 1        3.67 42.8           81.9
#> 2      -29.6   9.47          48.6
#> 3      -14.6  24.5           63.6
#> 4      -39.2  -0.0793        39.0
#> 5      -11.1  28.0           67.1