Reputation: 61
I'm trying to run a LASSO on our dataset, and to do so, I need to convert non-numeric variables to numeric, ideally via a sparse matrix. However, when I try to use the Matrix command, I get the same error:
Error in asMethod(object) : invalid class 'NA' to dup_mMatrix_as_geMatrix
I thought this was due to NA's in my data, so I did an na.omit and got the same error. I tried again with a mini subset of my code and got the same error again:
> sparsecombined <- Matrix(combined1[1:10,],sparse=TRUE)
Error in asMethod(object) : invalid class 'NA' to dup_mMatrix_as_geMatrix
This is the data set I tried to convert with that last line of code:
Is there anything that jumps out that might prevent sparse conversion?
Upvotes: 5
Views: 2582
Reputation: 44340
I got this error due to passing a data frame where a matrix was expected, and it looks like that's the same reason you are getting it. The solution in simple -- convert your data to a matrix before passing it to the Matrix
function:
sparsecombined <- Matrix(as.matrix(combined1[1:10,]),sparse=TRUE)
In your case, this code will probably complain because you have some non-numeric data stored in there (e.g. the TailNum
column). So you would need to downselect to just the numeric columns.
Upvotes: 0
Reputation: 57696
The easiest way to incorporate categorical variables into a LASSO is to use my glmnetUtils package, which provides a formula/data frame interface to glmnet.
glmnet(ArrDelay ~ ArrTime + uniqueCarrier + TailNum + Origin + Dest,
data=combined1, sparse=TRUE)
This automatically handles categorical vars via one-hot encoding (also known as dummy variables). It can also use sparse matrices if so desired.
Upvotes: 2
Reputation: 427
I think the error is due to the fact that you have non-numeric data types in your matrix.
Perhaps first convert your nun-numeric columns like UniqueCarrier to binary vectors using one-hot encoding. And only then convert the matrix to sparse.
Here is my code that I used for that conversion:
# Convert Genre into binary variables
# Convert genreVector into a corpus in order to parse each text string into a binary vector with 1s representing the presence of a genre and 0s the absence
library(tm)
library(slam)
convertToBinary <- function(category) {
genreVector = category
genreVector = strsplit(genreVector, "(\\s)?,(\\s)?") # separate out commas
genreVector = gsub(" ", "_", genreVector) # combine DirectorNames with whitespaces
genreCorpus = Corpus(VectorSource(genreVector))
#dtm = DocumentTermMatrix(genreCorpus, list(dictionary=genreNames))
dtm = DocumentTermMatrix(genreCorpus)
binaryGenreVector = inspect(dtm)
return(binaryGenreVector)
#return(data.frame(binaryGenreVector)) # convert binaryGenreVector to dataframe
}
directorBinary = convertToBinary(x$Director)
directorBinaryDF = as.data.frame(directorBinary)
See nograpes answer in
recommenderlab, Error in asMethod(object) : invalid class 'NA' to dup_mMatrix_as_geMatrix
Upvotes: 1