Reputation: 157
How can I sort values by columns, and then only keep n number of columns? This is what I'm struggling with:
import pandas as pd
import numpy as np
file = ('example.csv')
df = pd.read_csv(file)
df = df[['ID', 'Hero', 'Customer']].drop_duplicates('ID', keep="first")
df.head()
Output:
ID | Hero | Customer
45 Rambo Nils
90 Superman Sophia
33 Superman Sophia
55 Rambo Sophia
12 Hulk Sophia
To get into the format I want it (heatmap in this case), I've done this:
heatmap = df.groupby(['Hero', 'Customer']).size().unstack(fill_value=0)
heatmap
Output:
Ida Jonas Morgan Sophia Nils
Rambo 0 0 3 11 1
Superman 2 0 0 66 0
Hulk 0 0 0 7 0
What I want to do is to sort it by descending order so that the column with the highest values get pushed to the left, like this:
Wanted output:
Sophia Morgan Ida Nils Jonas
Rambo 11 3 0 1 0
Superman 66 0 2 0 0
Hulk 7 0 0 0 0
And if I would like to keep n-numbers of customers, how do I do that? Another problem I have is that the ID's are getting lost in the data manipulation, and I'm not sure how to deal with it after the count()-function. I feel like I've done it unnecessary complicated (?).
Upvotes: 1
Views: 171
Reputation: 3598
Having df
:
Ida Jonas Morgan Sophia Nils
Rambo 0 0 3 11 1
Superman 2 0 0 66 0
Hulk 0 0 0 7 0
try:
df.reindex(df.sum().sort_values(ascending = False).index, axis=1)
tesult:
Sophia Morgan Ida Nils Jonas
Rambo 11 3 0 1 0
Superman 66 0 2 0 0
Hulk 7 0 0 0 0
Upvotes: 1