Uses a logistic regression model to calibrate probabilities
Source:R/cal-estimate-logistic.R
cal_estimate_logistic.Rd
Uses a logistic regression model to calibrate probabilities
Usage
cal_estimate_logistic(
.data,
truth = NULL,
estimate = dplyr::starts_with(".pred_"),
smooth = TRUE,
parameters = NULL,
...
)
# S3 method for data.frame
cal_estimate_logistic(
.data,
truth = NULL,
estimate = dplyr::starts_with(".pred_"),
smooth = TRUE,
parameters = NULL,
...,
.by = NULL
)
# S3 method for tune_results
cal_estimate_logistic(
.data,
truth = NULL,
estimate = dplyr::starts_with(".pred_"),
smooth = TRUE,
parameters = NULL,
...
)
# S3 method for grouped_df
cal_estimate_logistic(
.data,
truth = NULL,
estimate = NULL,
smooth = TRUE,
parameters = NULL,
...
)
Arguments
- .data
An ungrouped
data.frame
object, ortune_results
object, that contains predictions and probability columns.- truth
The column identifier for the true class results (that is a factor). This should be an unquoted column name.
- estimate
A vector of column identifiers, or one of
dplyr
selector functions to choose which variables contains the class probabilities. It defaults to the prefix used by tidymodels (.pred_
). The order of the identifiers will be considered the same as the order of the levels of thetruth
variable.- smooth
Applies to the logistic models. It switches between logistic spline when
TRUE
, and simple logistic regression whenFALSE
.- parameters
(Optional) An optional tibble of tuning parameter values that can be used to filter the predicted values before processing. Applies only to
tune_results
objects.- ...
Additional arguments passed to the models or routines used to calculate the new probabilities.
- .by
The column identifier for the grouping variable. This should be a single unquoted column name that selects a qualitative variable for grouping. Default to
NULL
. When.by = NULL
no grouping will take place.
Details
This function uses existing modeling functions from other packages to create the calibration:
stats::glm()
is used whensmooth
is set toFALSE
mgcv::gam()
is used whensmooth
is set toTRUE
Multiclass Extension
This method has not been extended to multiclass outcomes. However, the
natural multiclass extension is cal_estimate_multinomial()
.
Examples
# It will automatically identify the probability columns
# if passed a model fitted with tidymodels
cal_estimate_logistic(segment_logistic, Class)
#>
#> ── Probability Calibration
#> Method: Generalized additive model
#> Type: Binary
#> Source class: Data Frame
#> Data points: 1,010
#> Truth variable: `Class`
#> Estimate variables:
#> `.pred_good` ==> good
#> `.pred_poor` ==> poor
# Specify the variable names in a vector of unquoted names
cal_estimate_logistic(segment_logistic, Class, c(.pred_poor, .pred_good))
#>
#> ── Probability Calibration
#> Method: Generalized additive model
#> Type: Binary
#> Source class: Data Frame
#> Data points: 1,010
#> Truth variable: `Class`
#> Estimate variables:
#> `.pred_good` ==> good
#> `.pred_poor` ==> poor
# dplyr selector functions are also supported
cal_estimate_logistic(segment_logistic, Class, dplyr::starts_with(".pred_"))
#>
#> ── Probability Calibration
#> Method: Generalized additive model
#> Type: Binary
#> Source class: Data Frame
#> Data points: 1,010
#> Truth variable: `Class`
#> Estimate variables:
#> `.pred_good` ==> good
#> `.pred_poor` ==> poor