Trajan
Trajan

Reputation: 1430

Invalid type for the dependent variable in lm() in R programming

This is an R problem, not a statistics problem.

I am trying to perform multiple linear regression in R for a set of 20 independent variables and 1 dependent variable. The 20 independent variables are in one csv file and the 1 dependent variable is in another csv file. Each row in each file corresponds to one measurement a day.

I have managed to import the 20 independent variables using read.csv(...) into a (variable?) called "predictors". I then imported the dependent measurements, again using read.csv(...), into a (variable?) called "dependent".

However when I use lm(dependent~X1+X2+X3+X4+X5+X6+X7+X8+X9+X10+X11+X12+X13+X14+X15+X16+X17+X18+X19+X20)

(Note: X_1,...,X20 are the headers of the columns for the predictors in that csv file)

I get the error:

Error in model.frame.default(formula = dependent ~ X1 + X2 + X3 + X4 + X5 + : invalid type (list) for variable 'dependent'

I cannot understand what is going wrong?

The predictors file looks something like (but up to X20)

enter image description here

and the dependent csv file looks like

enter image description here

Upvotes: 0

Views: 1618

Answers (1)

etienne
etienne

Reputation: 3678

Let's have some random data for df :

df<-replicate(5,rnorm(20))
names<-paste0('X',1:5)
colnames(df)<-names

dependent is already given in the comments, so we can use cbind to create one dataframe :

newDf<-cbind(dependent,df)

head(newDf)
#    dependent           X1         X2         X3           X4          X5
# 1 0.49295341 -1.728304515  0.9902622  0.6164557  0.904435464 -0.65801021
# 2 0.04331689  0.641830028  2.3829267  0.6165678  0.002691661  0.85520221
# 3 0.53106346 -1.529310531  0.6644159 -1.6921015 -1.176692158  1.15293623
# 4 0.06983530  0.001683688  0.2073812  0.3687421 -1.318220727  0.27627456
# 5 0.74574779  0.250247821 -2.2106331  0.9678592 -0.592997366  0.14410466
# 6 0.56349179  0.563867390  2.6917140  1.2765787  0.797380501 -0.07562508

We can then run the regression :

lm(dependent~.,newDf) # . selects all the other columns of newDf

# Call:
# lm(formula = dependent ~ ., data = newDf)

# Coefficients:
# (Intercept)           X1           X2           X3           X4           X5  
#     0.50522     -0.09975     -0.03040      0.06431     -0.00398     -0.09596 

Upvotes: 1

Related Questions