Reputation: 1889
I am using Scikit for feature selection, but I want to get the score values for all the unigrams in the text. I get the scores, but I how do I map these to actual feature names.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_selection import SelectKBest, chi2
Texts=["should schools have uniform","schools discipline","legalize marriage","marriage culture"]
labels=["3","3","7","7"]
vectorizer = CountVectorizer()
term_doc=vectorizer.fit_transform(Texts)
ch2 = SelectKBest(chi2, "all")
X_train = ch2.fit_transform(term_doc, labels)
print ch2.scores_
This gives the results, but how do I know which feature names maps to what scores?
Upvotes: 1
Views: 2477
Reputation: 1
To print the feature name at the initial select all features in chi-square then match it with your columns and as per out of p-value you can remove the feature.
from sklearn import datasets
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
X = df.drop("outcome",axis=1)
y = df["outcome"]
chi_scores = chi2(X,y)
chi_scores
p_values = pd.Series(chi_scores[1],index = X.columns)
p_values.sort_values(ascending = False , inplace = True)
p_values.plot.bar(figsize=(20,10))
print(p_values>=0.5)
Upvotes: 0