data princess
data princess

Reputation: 1158

R's glmnet throwing "A and B inner dimensions must match", but they already do

I should say that although I'm learning glmnet for this problem, I've used the same dataset with other methods and it has worked fine.

In this process, I split my data into training and test sets, all formatted as matrices, and glmnet builds the model without complaining. However, when I try to run a prediction on the holdout set, it throws the following error:

glmfit <- glmnet(train_x_mat,train_y_mat, alpha=1)
glmpred <- predict(glmfit, s=glmfit$lambda.1se, new = test_x_mat)
# output:
Error in cbind2(1, newx) %*% nbeta : 
Cholmod error 'A and B inner dimensions must match' at file ../MatrixOps/cholmod_ssmult.c, line 82

However, I know that x_train and x_test have the same number of columns:

ncol(test_x)
[1] 146
ncol(train_x)
[1] 146

I'm fairly new to glmnet; is there something more I need to do to make it cooperate?

Edit:

Here are the dimensions. Apologies for posting the vectors originally. This may be more at the heart of it.

dim(train_x_mat)
[1] 1411  208
dim(test_x_mat)
[1] 352 204

Which is strange, because they are created this way:

train_x_mat <- sparse.model.matrix(~.-1, data = train_x, verbose = F)
test_x_mat <- sparse.model.matrix(~.-1, data = test_x, verbose = F)

Upvotes: 2

Views: 2017

Answers (1)

data princess
data princess

Reputation: 1158

For anyone else who's running into this problem even though it seems like they shouldn't be, the issue is specifically with R's sparse.model.matrix. It will separate each level of a factor and give it its own column. Thus, if your dataset isn't particularly large, your training data and testing data could have different columns.

A solution, then, is to either add extra, blank columns to whichever matrix needs them, or else remove the columns that aren't shared by both. Of course, if you're building a model and expecting new data, the former is preferable. But anyway, the whole problem is a sign that your dataset is too small for the job.

Upvotes: 4

Related Questions