Reputation: 429
I have the following code:
reg <- lm(Y ~ x1 + x1_sq + x2 + x2_sq + x1x2 + d2 + d3 + d4, df)
Where all x_i are continuous variables and d_i are mutually exclusive dummy variables (d1 is present but exclude to avoid perfect multicollinearity). Rather than including the dummy variables, I want to run separate regressions for each dummy variable == 1. I wish to achieve this through a loop in the following form:
dummylist <- list("d1", "d2", "d3", "d4")
for(i in dummylist){
if(i==1){
ireg <- lm(Y ~ x1 + x1_sq + x2 + x2_sq + x1x2, df)
} else {
Unsure what to put here
}
}
My three(?) questions are:
Sorry if this is too much, please let me know if it is and I can cut it down or separate into multiple questions. I could not find a similar question, probably as I am rather new to running loops in R and don't know what to look for.
Upvotes: 0
Views: 457
Reputation: 8582
Short: No
In R there are many data types. One of the more versatile once is the list
object, which can store any type of object. Alternatively one could create an environment
to store the lists within, but that is a bit overkill.
If you know roughly how many elements should be in your list, the easiest is to initialize it prior to your loop as
n <- 3
regList <- vector(mode = "list", length = n)
# Optional naming:
#names(regList) <- c("d1 reg", "d2 reg", "d3 reg")
In your loop you then fill in your list iteratively:
for(i in seq_along(regList)){
regList[[i]] <- lm(...)
}
It is not entirely clear what you want here. Either you want to 'only' include the seperate dummy variables. For this the simplest is likely to save your formula
and updating it iteratively.
form <- Y ~ x1 + x1_sq + x2 + x2_sq + x1x2
for(i in seq_along(regList)){
#paste0 combine strings. ". ~ . + d1" means take the formula and add the element d1
form <- update(form, as.formula(paste0(". ~ . + d", i))
regList[[i]] <- lm(form, data = df)
}
or maybe you are actually trying to run separate regressions on the subset where d[i] == 1
. This can actually be done with lm
itself
form <- Y ~ x1 + x1_sq + x2 + x2_sq + x1x2
d <- list(d1, d2, d3)
for(i in seq_along(regList)){
#Using the subset argument
regList[[i]] <- lm(form, data = df, subset = which(d[[i]] == 1))
#Alternatively:
#regList[[i]] <- lm(form, data = subset(df, d[[i]] == 1))
}
Disclaimer: It is not entirely clear if d1, d2, d3 is a part of df. In this case the example below would work
regList[[i]] <- with(df, lm(form, subset = which(d[[i]] == 1)))
In this case it is not clearly the correct approach. But it isn't the wrong approach either in all circumstances. Here it just doesn't serve a clear purpose. And note that i in dummylist
would return "d1", "d2", "d3", "d4"
as the variables have been quoted, rather than directly placed within the list.
However another thing to address, is whether you have transformed the variables yourself, before performing your linear regression. Note that R
's internal function allows you to do this directly in the formula
, and doing this will allow it to help you avoid dummy-mistakes, such as testing variables for which an interaction exists, unless it is very very much what you wanted to do. For example i assume x1_sq = x1^2
. Maybe d1, d2, d3
are all contained in a variable d
? In these cases you should use the original variables as shown below:
lm(formula = Y ~ poly(x1, 2, raw = TRUE) + poly(x2, 2, raw = TRUE) + x1:x2, data = df ) #+d if d1, d2, d3 is part of the formula
poly
being the second order polynomial and raw = TRUE
returning the parameters as x1 + I(x1^2)
rather than the orthogonal representation.
If one does this, the output of drop1
, anova
etc. will take into account that it should not test the first order variables to the second order interactions.
Upvotes: 1