Tom
Tom

Reputation: 1063

How can I get the max value from one column where values match another column?

I have a very large dataframe where one column has a set of names in it. Unfortunately, the same column also has some other data there. What I want to do is to find all the rows that contain the names, look for those in another column, get the max value from a third column. Then, I need to place that value in the same row as the name.

I'm currently using a for loop, which does exactly what I want, but it is extremely slow. I've tried fooling around with lookup but can't get it to work. Just bear in mind that the real dataframe has something like 20,000 rows!

import pandas as pd
import numpy as np

friends = ['Mickey', 'Minnie', 'Goofy', 'Donald', 'Daisy', 'Pluto']

df_data = {'Name':['Mickey', 'Ravens', 'Vikings',
                    'Minnie', 'Packers', 'Browns',
                    'Goofy', 'Cowboys', 'Steelers',
                    'Donald', '49ers', 'Bears',
                    'Daisy', 'Chiefs', 'Raiders',
                    'Pluto', 'Patriots', 'Colts'],
        'Owner': [np.nan, 'Mickey', 'Mickey',
                    np.nan, 'Minnie', 'Minnie', 
                    np.nan, 'Goofy', 'Goofy', 
                    np.nan, 'Donald', 'Donald',
                    np.nan, 'Daisy', 'Daisy',
                    np.nan, 'Pluto', 'Pluto'],
        'Wins': [np.nan, 14, 10,
                np.nan, 13, 6,
                np.nan, 8, 8,
                np.nan, 13, 8,
                np.nan, 12, 7,
                np.nan, 12, 7]}

df = pd.DataFrame(data = df_data)

for f in friends:
    df.loc[df['Name'] == f, 'Wins'] = df.loc[df['Owner'] == f, 'Wins'].max()

print(df.to_string())

Upvotes: 2

Views: 4199

Answers (2)

timgeb
timgeb

Reputation: 78690

IIUC, you can use

df['Wins'].update(df['Name'].map(df.groupby('Owner')['Wins'].max()))

Upvotes: 2

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use, DataFrame.groupby on Owner then use the agg function max to create a mapping series s, Now using the series s update the values in Wins column in df using Series.update:

s = df.groupby('Owner')['Wins'].max()
df = df.set_index('Name')
df['Wins'].update(s)
df = df.reset_index()

# print(df)
        Name   Owner  Wins
0     Mickey     NaN  14.0
1     Ravens  Mickey  14.0
2    Vikings  Mickey  10.0
3     Minnie     NaN  13.0
4    Packers  Minnie  13.0
5     Browns  Minnie   6.0
6      Goofy     NaN   8.0
7    Cowboys   Goofy   8.0
8   Steelers   Goofy   8.0
9     Donald     NaN  13.0
10     49ers  Donald  13.0
11     Bears  Donald   8.0
12     Daisy     NaN  12.0
13    Chiefs   Daisy  12.0
14   Raiders   Daisy   7.0
15     Pluto     NaN  12.0
16  Patriots   Pluto  12.0
17     Colts   Pluto   7.0

Upvotes: 1

Related Questions