Applies a user-specified model-fitting function to each element of a list-column
of datasets in .data
, fitting models in parallel with a progress bar, and returns
the original data frame with a new model
column containing the fitted models.
Usage
fit_models(
.data,
.x,
.f,
packages = NULL,
n_cores = parallel::detectCores() - 1
)
Arguments
- .data
A data frame containing a list-column of datasets to which the model function will be applied.
- .x
Unquoted column name of the list-column containing the datasets.
- .f
A function or formula to apply to each dataset to fit the desired model (e.g.,
~ lm(y ~ x, data = .)
or~ lme4::lmer(y ~ x + (x | group), data = .)
).- packages
A character vector of package names to load on each parallel worker, if your model-fitting function requires additional packages. Defaults to
NULL
.- n_cores
Number of cores to use for parallel processing. Defaults to
parallel::detectCores() - 1
.
Value
The original .data
data frame with an additional model
column containing
the fitted model objects returned by .f
.
Details
This function is intended for use in simulation pipelines where multiple datasets
are generated (e.g., via simulate_datasets()
), and models need to be fitted to
each dataset efficiently in parallel.
It uses pbapply::pblapply()
to provide a progress bar during model fitting,
and parallel::makeCluster()
for multi-core processing.
Packages specified in packages
will be loaded on each worker to ensure model-fitting
functions that depend on those packages work correctly in parallel.
Examples
library(dplyr)
#> Error in library(dplyr): there is no package called ‘dplyr’
library(purrr)
library(lme4)
#> Error in library(lme4): there is no package called ‘lme4’
# Create example grouped datasets for mixed models
datasets <- tibble(
id = 1:5,
data = map(1:5, ~ {
df <- sleepstudy[sample(nrow(sleepstudy), 50, replace = TRUE), ]
df$Subject <- factor(df$Subject)
df
})
)
#> Error in tibble(id = 1:5, data = map(1:5, ~{ df <- sleepstudy[sample(nrow(sleepstudy), 50, replace = TRUE), ] df$Subject <- factor(df$Subject) df})): could not find function "tibble"
# Fit linear mixed models in parallel
fitted_models <- fit_models(
datasets,
.x = data,
.f = ~ lme4::lmer(Reaction ~ Days + (Days | Subject), data = .),
packages = c("lme4")
)
#> Error: object 'datasets' not found
# Inspect the first fitted mixed model
summary(fitted_models$model[[1]])
#> Error: object 'fitted_models' not found
# Tidy the fitted models using extract_model_results() for further evaluation
extracted <- extract_model_results(fitted_models)
#> Error in loadNamespace(x): there is no package called ‘tidyr’
head(extracted)
#> Error: object 'extracted' not found
# Summarise estimates for 'Days' across simulated fits
extracted |>
filter(term == "Days") |>
evaluate_model_results(
mean_estimate = mean(estimate, na.rm = TRUE),
sd_estimate = sd(estimate, na.rm = TRUE)
)
#> Error in loadNamespace(x): there is no package called ‘dplyr’