Dependent Variable Regression Ensemble
dvr_ensemble(formula, data, method = "lm", n_predictions, n_train_points, score_set = c("all", "test"), error_agg_fun = mean, scores_only = TRUE, ...)
formula | a formula interface specifying the model (see help("lm") for more detail) |
---|---|
data | a matrix or data.frame containing variables in model |
method | a model function (e.g. "lm", "randomForest") |
n_predictions | an integer specifying the number of components in the ensemble. If score_set = "test", set this high enough to ensure all points are predicted a sufficient number of times |
n_train_points | an integer or numeric value specifying the number of rows used in the training phase of each ensemble |
score_set | one of "all" or "test". If "all", scores all N points in each training iteration. If "test", score out of sample points in each iteration. |
error_agg_fun | a function for combining the squared prediction errors. Defaults to mean. |
scores_only | logical, if TRUE return a vector of outlier scores. If FALSE, return the error matrix and outlier scores |
if scores_only = TRUE, a vector of outlier scores. If FALSE, a list with outlier scores and the ensemble error matrix
section 3.2.1 of "Outlier Analysis" (C. C. Aggarwal. Outlier Analyis. Springer, 2017.)
ensemble_lm(formula = Sepal.Length ~ ., data = iris[,-5], n_predictions=100, n_train_points=100, error_agg_fun = median, scores_only = T)#> Error in ensemble_lm(formula = Sepal.Length ~ ., data = iris[, -5], n_predictions = 100, n_train_points = 100, error_agg_fun = median, scores_only = T): could not find function "ensemble_lm"