Reputation: 385

How to change only the maximum value of a group in pandas dataframe

I have following dataset

Item Count
A    60
A    20
A    21
B    33
B    33
B    32

Code to reproduce:

import pandas as pd
df = pd.DataFrame([
    ['A', 60],
    ['A', 20],
    ['A', 21],
    ['B', 33],
    ['B', 33],
    ['B', 32],
], 
    columns=['Item', 'Count'])

Suppose I have to Change only the maximum value of each group of "Item" column by adding 1.

the output should be like this:

Item Count New_Count
A    60    61
A    20    20
A    21    21
B    33    34
B    33    34
B    32    32

I tried df['New_Count']=df.groupby(['Item'])['Count'].transform(lambda x: max(x)+1) but all the values in "Count" was replaced by max value of each group +1.

Item Count New_Count
A    60    61
A    20    61
A    21    61
B    33    34
B    33    34
B    32    34

Upvotes: 18

Answers (4)

Scott Boston

Reputation: 153460

Here's another way not using groupby but using duplicated

df.loc[~df.sort_values('Count', ascending=False).duplicated('Item'), 'Count'] += 1

Output:

  Item  Count
0    A     61
1    A     20
2    A     21
3    B     34
4    B     33
5    B     32

Upvotes: 5

Terry

Reputation: 2811

to change the value in all the maximum values that are repeated you will need .groupby(), .join() and np.where()

df = pd.DataFrame([
['A', 60],
['A', 60],
['A', 20],
['A', 21],
['B', 21],
['B', 33],
['B', 34],
], columns=['Item', 'Count'])

s = df.groupby('Item')['Count'].max().rename('newCount')
df = df.set_index('Item').join(s).reset_index()
df['newCount'] = np.where(df['Count'] != df['newCount'], df['Count'], (df['newCount'] + 1))
df.head(10)
#output
    Item    Count   newCount
0   A       60      61
1   A       60      61
2   A       20      20
3   A       21      21
4   B       21      21
5   B       33      33
6   B       34      35

Edit

We can replace the .join() with a .transform() as suggested by @Dan

df['newCount'] = df.groupby('Item')['Count'].transform('max')
df['newCount'] = np.where(df['Count'] != df['newCount'], df['Count'], (df['newCount'] + 1))
#output
    Item    Count   newCount
0   A       60      61
1   A       60      61
2   A       20      20
3   A       21      21
4   B       21      21
5   B       33      33
6   B       34      35

Upvotes: 1

Dan

Reputation: 45752

Use idxmax:

idx = df.groupby("Item")["Count"].idxmax()
df["New_Count"] = df["Count"]
df.loc[idx, "New_Count"] += 1

This will only increment the first occurrence of th maximum in each group. If you want to increment all the maximum values in the case of a tie, you can use transform instead. Just replace the first line above with:

idx = df.groupby("Item")["Count"].transform(max) == df["Count"]

Upvotes: 12

Adam.Er8

Reputation: 13393

You can use idxmax() to get the idx of the maximum for each group, and increment only these items, like this:

max_idxs = df.groupby(['Item'])['Count'].idxmax()
df['New_Count']=df['Count'] # copy entire column
df['New_Count'][max_idxs]+=1 # increment only the maximum item for each group by 1

Upvotes: 5

How to change only the maximum value of a group in pandas dataframe

Answers (4)

Edit

Related Questions