Reputation: 15
I have a data set that looks like this:
I'm interested in the best possible multilinear regression, that's why I'm trying this LASSO method.
R, which stands for stock market returns, should be the dependent variable, whereas all the others (except D/Date and P/Price) are independent variables.
Here's what I've tried so far:
library(Matrix)
library(foreach)
library(glmnet)
trainX <- spxdata[c(4:11)]
trainY <- spxdata[c(3)]
CV = cv.glmnet(x = trainX, y = trainY, alpha = 1, nlambda = 100)
and this gives me the following error message:
Error in storage.mode(y) <- "double" : (list) object cannot be coerced to type 'double'
I'm not accustomed to R and only use it rarely, so I'm not sure how to go about this problem. I guess it has something to do with the format of my trainX and trainY subset, but what exactly have I done wrong here?
Upvotes: 1
Views: 2494
Reputation: 57686
The predictor matrix should be a matrix, and not a data frame, which is what you have there. Similarly, the response should be a vector, and not a one-column data frame.
You can get these with
trainX <- as.matrix(spxdata[4:11])
trainY <- spxdata[[3]] # not [3]
But in general, you may want to avoid these and other issues by using my glmnetUtils package, which implements a formula interface to glmnet. This lets you use it the same way you'd use glm
or rpart
or other modelling functions.
Upvotes: 3