Reputation: 577
I am using using glmnet for feature selection of a multinomial and cross validation. All is well, but with just under 400 predictors and 4 levels the output becomes a bit messy
X <- matrix(rnorm(350000),nrow=1000,ncol=350)
colnames(X) <- sample(LETTERS,350,TRUE)
Y <- factor(sample(LETTERS[1:5],1000,TRUE),levels =LETTERS[1:5])
out.cvfit <- cv.glmnet(x=X ,y=Y,standardize=TRUE,family="multinomial",parallel = TRUE,type.measure = "class")
So then I get this kind of output:
coef.cv.glmnet(out.cvfit,"lambda.1se")
...
$D
351 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 0.06770556
F .
L .
B .
W .
V .
W .
G .
X .
G .
A .
G .
V .
Q .
T .
...
A bit contrived example since all is zero, since there is no structure, but you get the idea.
Ok, this gets very cumbersome to look at across multiple levels and make summaries of extracted predictors. So, is there a way to extract only non-zero predictors from a sparse matrix ?
Upvotes: 1
Views: 1950
Reputation: 4980
I've saved the relevant output as a
. You can then subset using []
. Note that .
in dgCMatrix
is recognized as 0
.
a <- coef.cv.glmnet(out.cvfit,"lambda.1se")$D
a[a[,1]!=0,]
Data used (smaller version of your example).
set.seed(2018)
X <- matrix(rnorm(35000),nrow=1000,ncol=35)
colnames(X) <- sample(LETTERS,35,TRUE)
Y <- factor(sample(LETTERS[1:5],1000,TRUE),levels =LETTERS[1:5])
out.cvfit <- cv.glmnet(x=X ,y=Y,standardize=TRUE,family="multinomial",parallel = TRUE,type.measure = "class")
a <- coef.cv.glmnet(out.cvfit,"lambda.1se")$D
a[a[,1]!=0,]
(Intercept) R G R T Q L Z
0.017394446 -0.055170396 -0.006943011 0.006151795 0.017039835 -0.009432169 -0.047730565 0.065618965
Upvotes: 1