Lumberjack88
Lumberjack88

Reputation: 15

What format of x and y inputs does R glmnet expect?

I have a data set that looks like this:

dataset

I'm interested in the best possible multilinear regression, that's why I'm trying this LASSO method.

R, which stands for stock market returns, should be the dependent variable, whereas all the others (except D/Date and P/Price) are independent variables.

Here's what I've tried so far:

library(Matrix)
library(foreach)
library(glmnet)

trainX <- spxdata[c(4:11)]
trainY <- spxdata[c(3)]

CV = cv.glmnet(x = trainX, y = trainY, alpha = 1, nlambda = 100)

and this gives me the following error message:

Error in storage.mode(y) <- "double" : (list) object cannot be coerced to type 'double'

I'm not accustomed to R and only use it rarely, so I'm not sure how to go about this problem. I guess it has something to do with the format of my trainX and trainY subset, but what exactly have I done wrong here?

Upvotes: 1

Views: 2494

Answers (1)

Hong Ooi
Hong Ooi

Reputation: 57686

The predictor matrix should be a matrix, and not a data frame, which is what you have there. Similarly, the response should be a vector, and not a one-column data frame.

You can get these with

trainX <- as.matrix(spxdata[4:11])
trainY <- spxdata[[3]]                  # not [3]

But in general, you may want to avoid these and other issues by using my glmnetUtils package, which implements a formula interface to glmnet. This lets you use it the same way you'd use glm or rpart or other modelling functions.

Upvotes: 3

Related Questions