Reputation: 21
I have data(df_movies2) with columns: Year, production companies and revenue generated in that particular year. I want to return for each year, what is the maximum revenue along with the name of production companies. For example, in 2016 Studio Babelsberg has the maximum revenue. This is the data
Here is what I have tried
import pandas as pd
df_movie2.groupby(['year','production_companies']).revenue.max()
But its not working returnning all the names of production companies for each year. Thanks for your help
Upvotes: 0
Views: 144
Reputation: 374
I'm not entirely sure what you're hoping to return. If your output is sorted as you want but you're missing values, it's because the .max()
is dropping duplicates
for your respective year
. Please see edit 1 to return all values in ascending order from max
to min
.
If it's a sorting issue where you want to return the max
value to min
value and aren't worried about dropping duplicate
production_companies
for each year
then refer to edit 2:
import pandas as pd
d = ({
'year' : ['2016','2016','2016','2016','2016','2015','2015','2015','2015','2014','2014','2014','2014'],
'production_companies' : ['Walt Disney Pictures','Universal Pictures','DC Comics','Twentieth Century','Studio Babelsberg','DC Comics','Twentieth Century','Twentieth Century','Universal Pictures','The Kennedy/Marshall Company','Twentieth Century','Village Roadshow Pictures','Columbia Pictures'],
'revenue' : [966,875,873,783,1153,745,543,521,433,415,389,356,349],
})
df = pd.DataFrame(data = d)
Edit 1:
df = df.sort_values(['revenue', 'year'], ascending=[0, 1])
df = df.set_index(['year', 'production_companies'])
Output:
revenue
year production_companies
2016 Studio Babelsberg 1153
Walt Disney Pictures 966
Universal Pictures 875
DC Comics 873
Twentieth Century 783
2015 DC Comics 745
Twentieth Century 543
Twentieth Century 521
Universal Pictures 433
2014 Twentieth Century 389
Village Roadshow Pictures 356
Columbia Pictures 349
The Kennedy/Marshall Company 320
Edit 2:
df = df.groupby(['year','production_companies'])[['revenue']].max()
idx = df['revenue'].max(level=0).sort_values().index
i = pd.CategoricalIndex(df.index.get_level_values(0), ordered=True, categories=idx)
df.index = [i, df.index.get_level_values(1)]
df = df.sort_values(['year','revenue'], ascending=False)
Output:
revenue
year production_companies
2016 Studio Babelsberg 1153
Walt Disney Pictures 966
Universal Pictures 875
DC Comics 873
Twentieth Century 783
2015 DC Comics 745
Twentieth Century 543
Universal Pictures 433
2014 Twentieth Century 389
Village Roadshow Pictures 356
Columbia Pictures 349
The Kennedy/Marshall Company 320
Upvotes: 1