user11534866
user11534866

Reputation:

How to filter non-zero importance features from sparse matrix?

I have a dataset where most of the columns have text values. So I used tfidf and count vectorizers for converting this dataset into vector form. As, a result I got a sparse matrix. I applied Decision tree algorithm and I got the expected results. Now, I want to prepare another model where I use only those features that have non-zero feature importance. But, am not able to filter those features that have non-zero importance.

X_tr
<65548x3101 sparse matrix of type '<class 'numpy.float64'>'
    with 7713590 stored elements in Compressed Sparse Row format>

Here, X_tr is my training dataset.

X_tr.shape
(65548, 3101)

dtc.feature_importances_.shape
(3101,)

Here, 'dtc' is my decision tree classifier model.

My question is, how can I get another sparse matrix which contains only those feature where feature importance is a non-zero value ?

Upvotes: 0

Views: 1012

Answers (1)

ivirshup
ivirshup

Reputation: 661

I think this should be as simple as:

X_tr[:, dtc.feature_importances_ != 0]

Upvotes: 2

Related Questions