Reputation: 616
I have a dataset that I'll call dataset1 with a predictor variable (e.g. Price). I'm hoping to get a nice single predictor of price based on the n other predictors that exist in the dataset. But if n is large, I can't manually make and examine all these models, so I was hoping to use something like this:
for (i in names(dataset1)) {
model = lm(Price~i, dataset1)
# Do stuff here with model, such as analyze R^2 values.
}
(I thought this would work since replacing the inside of the for loop with print(i) results in the correct names.) The error is as follows:
Error in model.frame.default(formula = Price ~ i, data = dataset1, drop.unused.levels = TRUE) :
variable lengths differ (found for 'i')
Does anyone have advice for dealing with the problem regarding how R reads in the i variable? I know how to approach this problem using other software, but I would like to get a sense of how R works.
Upvotes: 0
Views: 4219
Reputation: 15441
I would go for some sort of *apply
here personally:
dat <- data.frame(price=1:10,y=10:1,z=1:10)
sapply(dat[2:3], function(q) coef(summary(lm(dat$price ~ q)))[2])
y z
-1 1
or to get a list with full model results:
lapply(dat[2:3], function(q) coef(summary(lm(dat$price ~ q))))
$y
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11 1.137008e-15 9.674515e+15 1.459433e-125
q -1 1.832454e-16 -5.457163e+15 1.423911e-123
$z
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.123467e-15 2.457583e-16 4.571429e+00 1.822371e-03
q 1.000000e+00 3.960754e-17 2.524772e+16 6.783304e-129
to get the r-squared value as you mentioned:
sapply(dat[2:3], function(q) summary(lm(dat$price ~ q))$r.squared)
Upvotes: 2
Reputation: 16026
At the moment you're not cycling through the names. Try
for(i in 2:ncol(dataset1)) #assuming Price is column 1
Then refer to
Price ~ dataset1[, i]
in your loop.
But I'm not sure about your approach from a stats perspective.
Upvotes: 1