AMisra
AMisra

Reputation: 1889

How to get feature names corresponding to scores for chi square feature selection in scikit

I am using Scikit for feature selection, but I want to get the score values for all the unigrams in the text. I get the scores, but I how do I map these to actual feature names.

from sklearn.feature_extraction.text  import CountVectorizer
from sklearn.feature_selection import  SelectKBest, chi2

Texts=["should schools have uniform","schools discipline","legalize marriage","marriage culture"]
labels=["3","3","7","7"]
vectorizer = CountVectorizer()
term_doc=vectorizer.fit_transform(Texts)
ch2 = SelectKBest(chi2, "all")
X_train = ch2.fit_transform(term_doc, labels)
print ch2.scores_

This gives the results, but how do I know which feature names maps to what scores?

Upvotes: 1

Views: 2477

Answers (2)

Pankaj Kumar Yadav
Pankaj Kumar Yadav

Reputation: 1

To print the feature name at the initial select all features in chi-square then match it with your columns and as per out of p-value you can remove the feature.

from sklearn import datasets
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

X = df.drop("outcome",axis=1)
y = df["outcome"]

chi_scores = chi2(X,y)

chi_scores

p_values = pd.Series(chi_scores[1],index = X.columns)
p_values.sort_values(ascending = False , inplace = True)


p_values.plot.bar(figsize=(20,10))

print(p_values>=0.5)

Upvotes: 0

cfh
cfh

Reputation: 4666

It's right there in the documentation:

get_feature_names()

Upvotes: 3

Related Questions