Reputation: 1761

Repeat regression with varying dependent variable

I've searched both Stack and google for a solution, none found to solve my problem.

I have about 40 dependent variables, for which I aim to obtain adjusted means (lsmeans). I need adjusted means for group A and group B, after accounting for some covariates. My final object should be a data frame with predicted means for all 40 dependent variables for group A and group B.

This is what I tried, without any success:

# Examplified here with 2 outcome variables
outcome1 <- c(2, 4, 6, 8, 10, 12, 14, 16)
outcome2 <- c(1, 2, 3, 4, 5, 6, 7, 8)
var1 <- c("a", "a", "a", "a", "b", "b", "b", "b")
var2 <- c(10, 11, 12, 9, 14, 9, 5, 8)
var3 <- c(100, 101, 120, 90, 140, 90, 50, 80)

df <- data.frame(outcome1, outcome2, var1, var2, var3)

dependents <- c(outcome1, outcome2)

library(lsmeans) #install.packages("lsmeans")

results <- list()
for (i in seq_along(dependents) {
    fit <- lm(i ~ var1 + var2 + var3, data= df)
    summary <- summary(lsmeans(fit, "var1"))
    summary$outcome <- i
    results[i] <- summary
    }

Upvotes: 2

Answers (3)

CoderGuy123

Reputation: 6669

In more modern R, the lazyeval package provides better functions for working with formulas.

Here's my version of your code:

#load libs
library(tidyverse)
library(lazyeval)
library(lsmeans)

#make data
df = tibble(
  y1 = c(2, 4, 6, 8, 10, 12, 14, 16),
  y2 = c(1, 2, 3, 4, 5, 6, 7, 8),
  var1 = c("a", "a", "a", "a", "b", "b", "b", "b"),
  var2 = c(10, 11, 12, 9, 14, 9, 5, 8),
  var3 = c(100, 101, 120, 90, 140, 90, 50, 80)
)

#outcomes
outcomes = c("y1", "y2")

#fit
results <- list()
for (i in seq_along(outcomes)) {
  #make a formula
  f = i ~ var1 + var2 + var3
  
  #set outcome, must be a symbol explicitly
  f_lhs(f) = as.symbol(outcomes[i])
  
  #fit
  fit <- lm(f, data = df)
  
  #save
  summary <- summary(lsmeans(fit, "var1"))
  results[[i]] = summary
}

#set outcome names
names(results) = outcomes

#print results
results

The last line prints:

$y1
 var1 lsmean   SE df lower.CL upper.CL
 a       5.5 1.38  4     1.68     9.32
 b      12.5 1.38  4     8.68    16.32

Confidence level used: 0.95 

$y2
 var1 lsmean    SE df lower.CL upper.CL
 a      2.75 0.688  4     0.84     4.66
 b      6.25 0.688  4     4.34     8.16

Confidence level used: 0.95

Generally, it would be easier to work with strings, and convert to a formula just before fitting. Here I did it using formulas.

Upvotes: 1

akrun

Reputation: 887591

Here is another option using lapply.

dependents <- c('outcome1', 'outcome2')
lst <- lapply(dependents, function(x) {
         fit <- lm(paste(x,'~', 'var1+var2+var3'), data=df)
         summary(lsmeans(fit, 'var1', data=df))})
Map(cbind, lst, outcome = seq_along(dependents))

Upvotes: 3

Mike Wise

Reputation: 22827

There were a few typos and things, but I think this is what you want:

# Examplified here with 2 outcome variables
outcome1 <- c(2, 4, 6, 8, 10, 12, 14, 16)
outcome2 <- c(1, 2, 3, 4, 5, 6, 7, 8)
var1 <- c("a", "a", "a", "a", "b", "b", "b", "b")
var2 <- c(10, 11, 12, 9, 14, 9, 5, 8)
var3 <- c(100, 101, 120, 90, 140, 90, 50, 80)

df <- data.frame(outcome1, outcome2, var1, var2, var3)

dependents <- c("outcome1", "outcome2")

library(lsmeans) #install.packages("lsmeans")

results <- list()
for (i in seq_along(dependents)) {
  eq <- paste(dependents[i],"~ var1 + var2 + var3")
  fit <- lm(as.formula(eq), data= df)
  summary <- summary(lsmeans(fit, "var1"))
  summary$outcome <- i
  results[[i]] <- summary
}

Upvotes: 2

Repeat regression with varying dependent variable

Answers (3)

Related Questions