Reputation: 385
I have following dataset
Item Count
A 60
A 20
A 21
B 33
B 33
B 32
Code to reproduce:
import pandas as pd
df = pd.DataFrame([
['A', 60],
['A', 20],
['A', 21],
['B', 33],
['B', 33],
['B', 32],
],
columns=['Item', 'Count'])
Suppose I have to Change only the maximum value of each group of "Item" column by adding 1.
the output should be like this:
Item Count New_Count
A 60 61
A 20 20
A 21 21
B 33 34
B 33 34
B 32 32
I tried df['New_Count']=df.groupby(['Item'])['Count'].transform(lambda x: max(x)+1)
but all the values in "Count" was replaced by max value of each group +1.
Item Count New_Count
A 60 61
A 20 61
A 21 61
B 33 34
B 33 34
B 32 34
Upvotes: 18
Views: 1953
Reputation: 153460
Here's another way not using groupby but using duplicated
df.loc[~df.sort_values('Count', ascending=False).duplicated('Item'), 'Count'] += 1
Output:
Item Count
0 A 61
1 A 20
2 A 21
3 B 34
4 B 33
5 B 32
Upvotes: 5
Reputation: 2811
to change the value in all the maximum values that are repeated you will need .groupby()
, .join()
and np.where()
df = pd.DataFrame([
['A', 60],
['A', 60],
['A', 20],
['A', 21],
['B', 21],
['B', 33],
['B', 34],
], columns=['Item', 'Count'])
s = df.groupby('Item')['Count'].max().rename('newCount')
df = df.set_index('Item').join(s).reset_index()
df['newCount'] = np.where(df['Count'] != df['newCount'], df['Count'], (df['newCount'] + 1))
df.head(10)
#output
Item Count newCount
0 A 60 61
1 A 60 61
2 A 20 20
3 A 21 21
4 B 21 21
5 B 33 33
6 B 34 35
We can replace the .join()
with a .transform()
as suggested by @Dan
df['newCount'] = df.groupby('Item')['Count'].transform('max')
df['newCount'] = np.where(df['Count'] != df['newCount'], df['Count'], (df['newCount'] + 1))
#output
Item Count newCount
0 A 60 61
1 A 60 61
2 A 20 20
3 A 21 21
4 B 21 21
5 B 33 33
6 B 34 35
Upvotes: 1
Reputation: 45752
Use idxmax
:
idx = df.groupby("Item")["Count"].idxmax()
df["New_Count"] = df["Count"]
df.loc[idx, "New_Count"] += 1
This will only increment the first occurrence of th maximum in each group.
If you want to increment all the maximum values in the case of a tie, you can use transform
instead. Just replace the first line above with:
idx = df.groupby("Item")["Count"].transform(max) == df["Count"]
Upvotes: 12
Reputation: 13393
You can use idxmax()
to get the idx of the maximum for each group, and increment only these items, like this:
max_idxs = df.groupby(['Item'])['Count'].idxmax()
df['New_Count']=df['Count'] # copy entire column
df['New_Count'][max_idxs]+=1 # increment only the maximum item for each group by 1
Upvotes: 5