Shweta Kapoor
Shweta Kapoor

Reputation: 21

How to find name of column along with the maximum value

I have data(df_movies2) with columns: Year, production companies and revenue generated in that particular year. I want to return for each year, what is the maximum revenue along with the name of production companies. For example, in 2016 Studio Babelsberg has the maximum revenue. This is the data

Here is what I have tried

  import pandas as pd 
df_movie2.groupby(['year','production_companies']).revenue.max()

But its not working returnning all the names of production companies for each year. Thanks for your help

Upvotes: 0

Views: 144

Answers (1)

jonboy
jonboy

Reputation: 374

I'm not entirely sure what you're hoping to return. If your output is sorted as you want but you're missing values, it's because the .max() is dropping duplicates for your respective year. Please see edit 1 to return all values in ascending order from max to min.

If it's a sorting issue where you want to return the max value to min value and aren't worried about dropping duplicate production_companies for each year then refer to edit 2:

import pandas as pd

d = ({
    'year' : ['2016','2016','2016','2016','2016','2015','2015','2015','2015','2014','2014','2014','2014'],        
    'production_companies' : ['Walt Disney Pictures','Universal Pictures','DC Comics','Twentieth Century','Studio Babelsberg','DC Comics','Twentieth Century','Twentieth Century','Universal Pictures','The Kennedy/Marshall Company','Twentieth Century','Village Roadshow Pictures','Columbia Pictures'],                 
    'revenue' : [966,875,873,783,1153,745,543,521,433,415,389,356,349],                                     
     })

df = pd.DataFrame(data = d)

Edit 1:

df = df.sort_values(['revenue', 'year'], ascending=[0, 1])
df = df.set_index(['year', 'production_companies'])

Output:

                                   revenue
year production_companies                 
2016 Studio Babelsberg                1153
     Walt Disney Pictures              966
     Universal Pictures                875
     DC Comics                         873
     Twentieth Century                 783
2015 DC Comics                         745
     Twentieth Century                 543
     Twentieth Century                 521
     Universal Pictures                433
2014 Twentieth Century                 389
     Village Roadshow Pictures         356
     Columbia Pictures                 349
     The Kennedy/Marshall Company      320

Edit 2:

df = df.groupby(['year','production_companies'])[['revenue']].max()
idx = df['revenue'].max(level=0).sort_values().index
i = pd.CategoricalIndex(df.index.get_level_values(0), ordered=True, categories=idx)
df.index = [i, df.index.get_level_values(1)]
df = df.sort_values(['year','revenue'], ascending=False)

Output:

                                   revenue
year production_companies                 
2016 Studio Babelsberg                1153
     Walt Disney Pictures              966
     Universal Pictures                875
     DC Comics                         873
     Twentieth Century                 783
2015 DC Comics                         745
     Twentieth Century                 543
     Universal Pictures                433
2014 Twentieth Century                 389
     Village Roadshow Pictures         356
     Columbia Pictures                 349
     The Kennedy/Marshall Company      320

Upvotes: 1

Related Questions