Reputation: 1

Running Multiple Linear Regression Models in for-Loop

The logic is similar to the content-based recommender,

content	undesirable	desirable	user_1	user_10
1	3.00	2.77	0.11	NA
...
5000	2.50	2.11	NA	0.12

I need to run the model for undesirable and desirable as independent values and each user as the dependent value, thus I need run 10 times to fit the model and predict each user's NA value.

This is the code that I hard coding, but I wonder how to use for loop, I just searched for several methods but they do not work for me...

the data as 'test'

hard code

#fit model
fit_1 = lm(user_1 ~ undesirable + desirable, data = test)
...
fit_10 = lm(user_10 ~ undesirable + desirable, data = test)

#prediction
u_1_na = test[is.na(test$user_1), c('user_1', 'undesirable', 'desirable')]
result1 = predict(fit_1, newdata = u_1_na)
which(result1 == max(result1))
max(result1)
...
u_10_na = test[is.na(test$user_10), c('user_10', 'undesirable', 'desirable')]
result10 = predict(fit_10, newdata = u_10_na)
which(result10 == max(result10))
max(result10)

#make to csv file
apply each max predict value to csv.

this is what I try for now(for loop)

mod_summaries <- list() 

for(i in 1:10) {                 
  
  predictors_i <- colnames(data)[1:10]   
  mod_summaries[[i - 1]] <- summary(     
    lm(predictors_i ~ ., test[ , c("undesirable", 'desirable')]))
  
}

Upvotes: 0

Answers (3)

M.Viking

Reputation: 5418

An apply method:

mod_summaries_lapply <-
  lapply(
    colnames(mtcars),
    FUN = function(x)
      summary(lm(reformulate(".", response = x), data = mtcars))
  )

A for loop method to make linear models for each column. The key is the reformulate() function, which creates the formula from strings. In the question, the function is made of a string and results in error invalid term in model formula. The string needs to be evaluated with eval() . This example uses the mtcars dataset.

mod_summaries <- list() 
for(i in 1:11) {                 
  predictors_i <- colnames(mtcars)[i]   
  mod_summaries[[i]] <- summary(lm(reformulate(".", response = predictors_i), data=mtcars))
  #summary(lm(reformulate(". -1", response = predictors_i), data=mtcars))  # -1 to exclude intercept
  #summary(lm(as.formula(paste(predictors_i, "~ .")), data=mtcars)) # a "paste as formula" method
}

Upvotes: 1

Limey

Reputation: 12585

Avoid the loop. Make your data tidy. Something like:

library(tidyverse)

test %>%
  select(-content) %>%
  pivot_longer(
    starts_with("user"),
    names_to="user",
    values_to="value"
  ) %>%
  group_by(user) %>%
  group_map(
    function(.x, .y) {
      summary(lm(user ~ ., data=.x))
    }
  )

Untested code since your example is not reproducible.

Upvotes: 0

Nico

Reputation: 506

You could use the function as.formula together with the paste function to create your formula. Following is an example

formula_lm <- as.formula(
    paste(response_var, 
          paste(expl_var, collapse = " + "), 
          sep = " ~ "))

This implies that you have more than one explanatory variable (separated in the paste with +). If you only have one, omit the second paste.

With the created formula, you can use the lm funciton like this:

lm(formula_lm, data)

Edit: the vector expl_var would in your case include the undesirable and desirable variable.

Upvotes: 0

Running Multiple Linear Regression Models in for-Loop

Answers (3)

Related Questions