jceg316
jceg316

Reputation: 489

How do I view model.feature_importances_ output with the names of the features?

I've built a DecisionTreeClassifier model in python and would like to see the importance of each feature. As I'm using sklearn I've converted all my classes to numbers. Here's how I've imported the data:

raw_data = pd.read_csv('Video_Games_Sales_as_at_22_Dec_2016.csv')
no_na_df = raw_data.dropna(how='any')

after getting rid of NAs I created my DF for numeric conversion:

numeric_df = no_na_df.copy()
cols = ['Platform','Genre','Publisher','Developer','Rating']
numeric_df[cols] = numeric_df[cols].apply(lambda x: pd.factorize(x)[0]+1)

Once that's done, I created the test and train split:

X = numeric_df.drop(['Name','Global_Sales_Bin','Global_Sales','NA_Sales','EU_Sales','JP_Sales','Other_Sales'], axis = 1)
y = numeric_df['Global_Sales_Bin']

X = np.array(X)
y = np.array(y)

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.3, random_state = 0)

ran the model etc, got my results, and then I wanted to see the importance of each feature:

model.feature_importances_

which output this:

array([ 0.08518705,  0.07874186,  0.06322593,  0.08446309,  0.08410844,
        0.08097326,  0.07744228,  0.1851621 ,  0.23597441,  0.02472158])

I don't know how to match up the features in the model with the numbers above. both 'X' and 'model' are stored as numpy arrays and the orginal dataframe has been cut down to fit the model so the features don't align properly. I think I might have to use a for loop and zip, but not sure how.

Thanks.

Upvotes: 1

Views: 5805

Answers (1)

jceg316
jceg316

Reputation: 489

This ended up working list(zip(X_columns, model.feature_importances_))

X_columns = X.columns

Output:

[('Platform', 0.085187050413710552),
 ('Year_of_Release', 0.078741862224430401),
 ('Genre', 0.063225925635322172),
 ('Publisher', 0.084463091000316695),
 ('Critic_Score', 0.084108440698256848),
 ('Critic_Count', 0.080973259803115372),
 ('User_Score', 0.077442278687036153),
 ('User_Count', 0.18516210213713488),
 ('Developer', 0.23597440837370295),
 ('Rating', 0.024721581026973961)]

Upvotes: 2

Related Questions