Reputation: 7
I was looking for a shorter way to write this using for loops
ie: i is 1 to 22 and my data will add columns 1 through 22 in the multiple regression:
reg <-lm(log(y)~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+z1+z+z3+z4+z5+z6+z7+z8+z9+z10+z11+z12, data)
To clarify, x1 and x2 and x3 are all column names - they are x two (not x squared), I am trying to do a multiple regression with the last 22 columns in my data set
Someone suggested to do this:
reg1 <- lm(log(data$y)~terms( as.formula(
paste(" ~ (", paste0("X", 29:ncol(data) , collapse="+"), ")")
)
))
But
Upvotes: 0
Views: 198
Reputation: 263441
I know that a for-loop was requested but it would have been a clumsy strategy, so here's a possible correct strategy:
formchr <- paste(
paste( "log(y)" , paste0( "x", 1:10, collapse="+"), sep="~"),
# the LHS and first 10 terms
paste0( "z", 1:12, collapse="+"), #next 12 terms
sep="+") # put both parts together
reg1 <- lm( as.formula(formchr), data=data)
The full character-version of the formula should be passed to the as.formula
function and the paste
and paste0
functions are fully vectorized, so no loop is needed.
If the first 22 columns were the desired target for the RHS terms, you could have pasted together names(data)[1:22]
or ...[29:50]
if those were hte locations, and htis would be substituted for the RHS terms in the second paste
above, dropping the third paste
.
The only reason I used data
as the name of an object is that it was implied by the question. It is a very confusing practice to use that name. data
is an R function and objects should have specific names that do not overlap with function names. The other very commonly abused name in this regard is df
, which is the density function for the distribution.
Upvotes: 1
Reputation: 1410
You could first subset your data into a data.frame which contains only the columns of interest. Then, you can run a linear model using the .
formula syntax to select all columns other than the y variable.
Example using 1000 rows and 50 cols of data
N <- 1000
P <- 50
data <- as.data.frame(rep(data.frame(rnorm(N)), P))
Assign your y data to y.
y <- as.data.frame(rep(data.frame(rnorm(N)), 1))
Create a new data.frame containing y and the last 22 columns.
model_data <- cbind(y, data[ ,29:50])
colnames(model_data) <- c("y", paste0("x", 1:10), paste0("z",1:12))
The following should do the trick. The .
formula syntax will select all columns other than the y column.
reg <-lm(log(y) ~ ., data = model_data)
Upvotes: 0