Fit models in parallel across a list-column of datasets

Applies a user-specified model-fitting function to each element of a list-column of datasets in .data, fitting models in parallel with a progress bar, and returns the original data frame with a new model column containing the fitted models.

Usage

fit_models(
  .data,
  .x,
  .f,
  packages = NULL,
  n_cores = parallel::detectCores() - 1
)

Arguments

.data: A data frame containing a list-column of datasets to which the model function will be applied.
.x: Unquoted column name of the list-column containing the datasets.
.f: A function or formula to apply to each dataset to fit the desired model (e.g., ~ lm(y ~ x, data = .) or ~ lme4::lmer(y ~ x + (x | group), data = .)).
packages: A character vector of package names to load on each parallel worker, if your model-fitting function requires additional packages. Defaults to NULL.
n_cores: Number of cores to use for parallel processing. Defaults to parallel::detectCores() - 1.

Value

The original .data data frame with an additional model column containing the fitted model objects returned by .f.

Details

This function is intended for use in simulation pipelines where multiple datasets are generated (e.g., via simulate_datasets()), and models need to be fitted to each dataset efficiently in parallel.

It uses pbapply::pblapply() to provide a progress bar during model fitting, and parallel::makeCluster() for multi-core processing.

Packages specified in packages will be loaded on each worker to ensure model-fitting functions that depend on those packages work correctly in parallel.

Examples

library(dplyr)
#> Error in library(dplyr): there is no package called ‘dplyr’
library(purrr)
library(lme4)
#> Error in library(lme4): there is no package called ‘lme4’

# Create example grouped datasets for mixed models
datasets <- tibble(
  id = 1:5,
  data = map(1:5, ~ {
    df <- sleepstudy[sample(nrow(sleepstudy), 50, replace = TRUE), ]
    df$Subject <- factor(df$Subject)
    df
  })
)
#> Error in tibble(id = 1:5, data = map(1:5, ~{    df <- sleepstudy[sample(nrow(sleepstudy), 50, replace = TRUE),         ]    df$Subject <- factor(df$Subject)    df})): could not find function "tibble"

# Fit linear mixed models in parallel
fitted_models <- fit_models(
  datasets,
  .x = data,
  .f = ~ lme4::lmer(Reaction ~ Days + (Days | Subject), data = .),
  packages = c("lme4")
)
#> Error: object 'datasets' not found

# Inspect the first fitted mixed model
summary(fitted_models$model[[1]])
#> Error: object 'fitted_models' not found

# Tidy the fitted models using extract_model_results() for further evaluation
extracted <- extract_model_results(fitted_models)
#> Error in loadNamespace(x): there is no package called ‘tidyr’
head(extracted)
#> Error: object 'extracted' not found

# Summarise estimates for 'Days' across simulated fits
extracted |>
  filter(term == "Days") |>
  evaluate_model_results(
    mean_estimate = mean(estimate, na.rm = TRUE),
    sd_estimate = sd(estimate, na.rm = TRUE)
  )
#> Error in loadNamespace(x): there is no package called ‘dplyr’