artemis
artemis

Reputation: 7281

Pandas sort not sorting data properly

I am trying to sort the results of sklearn.ensemble.RandomForestRegressor's feature_importances_

I have the following function:

def get_feature_importances(cols, importances):
    feats = {}
    for feature, importance in zip(cols, importances):
        feats[feature] = importance

    importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})
    importances.sort_values(by='Gini-importance')

    return importances

I use it like so:

importances = get_feature_importances(X_test.columns, rf.feature_importances_)
print()
print(importances)

And I get the following results:

| PART               | 0.035034 |
| MONTH1             | 0.02507  |
| YEAR1              | 0.020075 |
| MONTH2             | 0.02321  |
| YEAR2              | 0.017861 |
| MONTH3             | 0.042606 |
| YEAR3              | 0.028508 |
| DAYS               | 0.047603 |
| MEDIANDIFF         | 0.037696 |
| F2                 | 0.008783 |
| F1                 | 0.015764 |
| F6                 | 0.017933 |
| F4                 | 0.017511 |
| F5                 | 0.017799 |
| SS22               | 0.010521 |
| SS21               | 0.003896 |
| SS19               | 0.003894 |
| SS23               | 0.005249 |
| SS20               | 0.005127 |
| RR                 | 0.021626 |
| HI_HOURS           | 0.067584 |
| OI_HOURS           | 0.054369 |
| MI_HOURS           | 0.062121 |
| PERFORMANCE_FACTOR | 0.033572 |
| PERFORMANCE_INDEX  | 0.073884 |
| NUMPA              | 0.022445 |
| BUMPA              | 0.024192 |
| ELOH               | 0.04386  |
| FFX1               | 0.128367 |
| FFX2               | 0.083839 |

I thought the line importances.sort_values(by='Gini-importance') would sort them. But it is not. Why is this not performing correctly?

Upvotes: 0

Views: 54

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150785

importances.sort_values(by='Gini-importance') returns the sorted dataframe, which is overlooked by your function.

You want return importances.sort_values(by='Gini-importance').

Or you could make sort_values inplace:

importances.sort_values(by='Gini-importance', inplace=True)

return importances

Upvotes: 2

Related Questions