Reputation: 1430
This is an R problem, not a statistics problem.
I am trying to perform multiple linear regression in R for a set of 20 independent variables and 1 dependent variable. The 20 independent variables are in one csv file and the 1 dependent variable is in another csv file. Each row in each file corresponds to one measurement a day.
I have managed to import the 20 independent variables using read.csv(...) into a (variable?) called "predictors". I then imported the dependent measurements, again using read.csv(...), into a (variable?) called "dependent".
However when I use lm(dependent~X1+X2+X3+X4+X5+X6+X7+X8+X9+X10+X11+X12+X13+X14+X15+X16+X17+X18+X19+X20)
(Note: X_1,...,X20 are the headers of the columns for the predictors in that csv file)
I get the error:
Error in model.frame.default(formula = dependent ~ X1 + X2 + X3 + X4 + X5 + : invalid type (list) for variable 'dependent'
I cannot understand what is going wrong?
The predictors file looks something like (but up to X20)
and the dependent csv file looks like
Upvotes: 0
Views: 1618
Reputation: 3678
Let's have some random data for df :
df<-replicate(5,rnorm(20))
names<-paste0('X',1:5)
colnames(df)<-names
dependent is already given in the comments, so we can use cbind
to create one dataframe :
newDf<-cbind(dependent,df)
head(newDf)
# dependent X1 X2 X3 X4 X5
# 1 0.49295341 -1.728304515 0.9902622 0.6164557 0.904435464 -0.65801021
# 2 0.04331689 0.641830028 2.3829267 0.6165678 0.002691661 0.85520221
# 3 0.53106346 -1.529310531 0.6644159 -1.6921015 -1.176692158 1.15293623
# 4 0.06983530 0.001683688 0.2073812 0.3687421 -1.318220727 0.27627456
# 5 0.74574779 0.250247821 -2.2106331 0.9678592 -0.592997366 0.14410466
# 6 0.56349179 0.563867390 2.6917140 1.2765787 0.797380501 -0.07562508
We can then run the regression :
lm(dependent~.,newDf) # . selects all the other columns of newDf
# Call:
# lm(formula = dependent ~ ., data = newDf)
# Coefficients:
# (Intercept) X1 X2 X3 X4 X5
# 0.50522 -0.09975 -0.03040 0.06431 -0.00398 -0.09596
Upvotes: 1