serlomu
serlomu

Reputation: 21

Sorting for the max on several columns

I have a data frame and I need to sort it samples by several columns. For example, having the below data frame, it needs to sort by the maximum price and maximum horse-power for each different company.

  company     price   horse-power 
  toyota     20000     100
  toyota     20000     125
  bmw        40000     150
  volkswagen 33000     130  
  bmw        45000     200 
  toyota     20000     120
  audi       30000     150
  volkswagen 32000     110
  audi       35000     180
  volkswagen 33000     135

the output should be:

  company     price   horse-power 
  volkswagen 33000     135  
  bmw        45000     200 
  toyota     20000     125
  audi       35000     180

I have a data frame and I need to sort it samples by several columns. For example, having the below data frame, it needs to sort by the maximum price and maximum horse-power for each different company.

  company     price   horse-power 
  toyota     20000     100
  toyota     20000     125
  bmw        40000     150
  volkswagen 33000     130  
  bmw        45000     200 
  toyota     20000     120
  audi       30000     150
  volkswagen 32000     110
  audi       35000     180
  volkswagen 33000     135

the output should be:

  company     price   horse-power 
  volkswagen 33000     135  
  bmw        45000     200 
  toyota     20000     125
  audi       35000     180

Upvotes: 0

Views: 32

Answers (2)

russhoppa
russhoppa

Reputation: 393

If your dataset contains more columns than these and you only want to keep the max of a subset of the columns, do it this way:

subset_to_max = ['price', 'horse-power']
df.sort_values(subset_to_max, ascending=False).drop_duplicates(['company'], keep='first')

Upvotes: 0

Naveed
Naveed

Reputation: 11650

groupby with a max

df.groupby('company',as_index=False).max()
company     price   horse-power
0   audi    35000   180
1   bmw     45000   200
2   toyota  25000   125
3   volkswagen  33000   130

Upvotes: 1

Related Questions