Nonparametric prediction intervals can be computed for fitted regression workflow objects using the CV+ conformal inference method described by Barber at al (2018).
Usage
int_conformal_cv(object, ...)
# S3 method for default
int_conformal_cv(object, ...)
# S3 method for resample_results
int_conformal_cv(object, ...)
# S3 method for tune_results
int_conformal_cv(object, parameters, ...)
Arguments
- object
An object from a tidymodels resampling or tuning function such as
tune::fit_resamples()
,tune::tune_grid()
, or similar. The object should have been produced in a way that the.extracts
column contains the fitted workflow for each resample (see the Details below).- ...
Not currently used.
- parameters
An tibble of tuning parameter values that can be used to filter the predicted values before processing. This tibble should select a single set of hyper-parameter values from the tuning results. This is only required when a tuning object is passed to
object
.
Value
An object of class "int_conformal_cv"
containing the information
to create intervals. The predict()
method is used to produce the intervals.
Details
This function implements the CV+ method found in Section 3 of Barber at al (2018). It uses the resampled model fits and their associated holdout residuals to make prediction intervals for regression models.
This function prepares the objects for the computations. The predict()
method computes the intervals for new data.
This method was developed for V-fold cross-validation (no repeats). Interval coverage is unknown for any other resampling methods. The function will not stop the computations for other types of resamples, but we have no way of knowing whether the results are appropriate.
References
Rina Foygel Barber, Emmanuel J. Candès, Aaditya Ramdas, Ryan J. Tibshirani "Predictive inference with the jackknife+," The Annals of Statistics, 49(1), 486-507, 2021
Examples
library(workflows)
library(dplyr)
library(parsnip)
library(rsample)
library(tune)
library(modeldata)
set.seed(2)
sim_train <- sim_regression(200)
sim_new <- sim_regression(5) %>% select(-outcome)
sim_rs <- vfold_cv(sim_train)
# We'll use a neural network model
mlp_spec <-
mlp(hidden_units = 5, penalty = 0.01) %>%
set_mode("regression")
# Use a control function that saves the predictions as well as the models.
# Consider using the butcher package in the extracts function to have smaller
# object sizes
ctrl <- control_resamples(save_pred = TRUE, extract = I)
set.seed(3)
nnet_res <-
mlp_spec %>%
fit_resamples(outcome ~ ., resamples = sim_rs, control = ctrl)
nnet_int_obj <- int_conformal_cv(nnet_res)
nnet_int_obj
#> Conformal inference via CV+
#> preprocessor: formula
#> model: mlp (engine = nnet)
#> number of models: 10
#> training set size: 200
#>
#> Use `predict(object, new_data, level)` to compute prediction intervals
predict(nnet_int_obj, sim_new)
#> # A tibble: 5 × 3
#> .pred_lower .pred .pred_upper
#> <dbl> <dbl> <dbl>
#> 1 3.67 42.8 81.9
#> 2 -29.6 9.47 48.6
#> 3 -14.6 24.5 63.6
#> 4 -39.2 -0.0793 39.0
#> 5 -11.1 28.0 67.1