Reputation: 1158
I should say that although I'm learning glmnet for this problem, I've used the same dataset with other methods and it has worked fine.
In this process, I split my data into training and test sets, all formatted as matrices, and glmnet builds the model without complaining. However, when I try to run a prediction on the holdout set, it throws the following error:
glmfit <- glmnet(train_x_mat,train_y_mat, alpha=1)
glmpred <- predict(glmfit, s=glmfit$lambda.1se, new = test_x_mat)
# output:
Error in cbind2(1, newx) %*% nbeta :
Cholmod error 'A and B inner dimensions must match' at file ../MatrixOps/cholmod_ssmult.c, line 82
However, I know that x_train
and x_test
have the same number of columns:
ncol(test_x)
[1] 146
ncol(train_x)
[1] 146
I'm fairly new to glmnet; is there something more I need to do to make it cooperate?
Here are the dimensions. Apologies for posting the vectors originally. This may be more at the heart of it.
dim(train_x_mat)
[1] 1411 208
dim(test_x_mat)
[1] 352 204
Which is strange, because they are created this way:
train_x_mat <- sparse.model.matrix(~.-1, data = train_x, verbose = F)
test_x_mat <- sparse.model.matrix(~.-1, data = test_x, verbose = F)
Upvotes: 2
Views: 2017
Reputation: 1158
For anyone else who's running into this problem even though it seems like they shouldn't be, the issue is specifically with R's sparse.model.matrix
. It will separate each level of a factor and give it its own column. Thus, if your dataset isn't particularly large, your training data and testing data could have different columns.
A solution, then, is to either add extra, blank columns to whichever matrix needs them, or else remove the columns that aren't shared by both. Of course, if you're building a model and expecting new data, the former is preferable. But anyway, the whole problem is a sign that your dataset is too small for the job.
Upvotes: 4