Prem
Prem

Reputation: 61

Uncommon error message converting Matrix to Sparse in R

I'm trying to run a LASSO on our dataset, and to do so, I need to convert non-numeric variables to numeric, ideally via a sparse matrix. However, when I try to use the Matrix command, I get the same error:

Error in asMethod(object) : invalid class 'NA' to dup_mMatrix_as_geMatrix

I thought this was due to NA's in my data, so I did an na.omit and got the same error. I tried again with a mini subset of my code and got the same error again:

> sparsecombined <- Matrix(combined1[1:10,],sparse=TRUE)
Error in asMethod(object) : invalid class 'NA' to dup_mMatrix_as_geMatrix

This is the data set I tried to convert with that last line of code:

enter image description here

Is there anything that jumps out that might prevent sparse conversion?

Upvotes: 5

Views: 2582

Answers (3)

josliber
josliber

Reputation: 44340

I got this error due to passing a data frame where a matrix was expected, and it looks like that's the same reason you are getting it. The solution in simple -- convert your data to a matrix before passing it to the Matrix function:

sparsecombined <- Matrix(as.matrix(combined1[1:10,]),sparse=TRUE)

In your case, this code will probably complain because you have some non-numeric data stored in there (e.g. the TailNum column). So you would need to downselect to just the numeric columns.

Upvotes: 0

Hong Ooi
Hong Ooi

Reputation: 57696

The easiest way to incorporate categorical variables into a LASSO is to use my glmnetUtils package, which provides a formula/data frame interface to glmnet.

glmnet(ArrDelay ~ ArrTime + uniqueCarrier + TailNum + Origin + Dest,
       data=combined1, sparse=TRUE)

This automatically handles categorical vars via one-hot encoding (also known as dummy variables). It can also use sparse matrices if so desired.

Upvotes: 2

DRozen
DRozen

Reputation: 427

I think the error is due to the fact that you have non-numeric data types in your matrix.

Perhaps first convert your nun-numeric columns like UniqueCarrier to binary vectors using one-hot encoding. And only then convert the matrix to sparse.

Here is my code that I used for that conversion:

    # Convert Genre into binary variables

# Convert genreVector into a corpus in order to parse each text string into a binary vector with 1s representing the presence of a genre and 0s the absence 
library(tm)
library(slam)

convertToBinary <- function(category) {
  genreVector = category
  genreVector = strsplit(genreVector, "(\\s)?,(\\s)?") # separate out commas

  genreVector = gsub(" ", "_", genreVector) # combine DirectorNames with whitespaces

  genreCorpus = Corpus(VectorSource(genreVector))
  #dtm = DocumentTermMatrix(genreCorpus, list(dictionary=genreNames))
  dtm = DocumentTermMatrix(genreCorpus)
  binaryGenreVector = inspect(dtm)

  return(binaryGenreVector)
  #return(data.frame(binaryGenreVector)) # convert binaryGenreVector to dataframe
}

directorBinary = convertToBinary(x$Director)
directorBinaryDF = as.data.frame(directorBinary)

See nograpes answer in

recommenderlab, Error in asMethod(object) : invalid class 'NA' to dup_mMatrix_as_geMatrix

Upvotes: 1

Related Questions