Reputation: 99
Rather than explain this in a use case agnostic way I will provide the columns names as this is much clearer.
I have three columns: PlayerName, Salary, Position.
An example:
PlayerName, Salary, Position
Joe, 3000, FWD
Joe, 4500, FWD
Bill, 3200, CNT
Bill, 2000, CNT
Jill, 1200, GRD
Jill, 2200, GRD
I need to find where the names are the same, and then rename the Position of the row with the higher Salary to CPT ([the original Position value])
The thing I am the most stuck on is how to do a same column comparison not just to find the duplicates (like using np.isin) but to do the comparison coming of of that.
In this example the output would be:
Joe, 3000, FWD
Joe, 4500, CPT (FWD)
Bill, 3200, CPT (CNT)
Bill, 2000, CNT
Jill, 1200, GRD
Jill, 2200, CPT (GRD)
Upvotes: 0
Views: 65
Reputation: 51155
You say that there are always 2 duplicate entries, so you can simply use idxmax
+ loc
:
m = df.groupby('PlayerName')['Salary'].idxmax()
df.loc[m, 'Position'] = 'CPT' + df.loc[m, 'Position'].map(' ({})'.format)
PlayerName Salary Position
0 Joe 3000 FWD
1 Joe 4500 CPT (FWD)
2 Bill 3200 CPT (CNT)
3 Bill 2000 CNT
4 Jill 1200 GRD
5 Jill 2200 CPT (GRD)
Upvotes: 2
Reputation: 2939
You can find the max salary column by each player, you can then match the index (this will give you True when you are in the row with the maximum salary for the player) and update the string value of position for those indices. Try the below:
player_df = pd.DataFrame(columns=["PlayerName", "Salary", "Position"], data=[
["Joe", 3000, "FWD"],
["Joe", 4500, "FWD"],
["Bill", 3200, "CNT"],
["Bill", 2000, "CNT"],
["Jill", 1200, "GRD"],
["Jill", 2200, "GRD"]])
player_df["top_sal"] = player_df.groupby("PlayerName")["Salary"].transform("max")
player_df.loc[player_df["top_sal"] == player_df["Salary"], "Position"] = "CPT (" + player_df["Position"] + ")"
Upvotes: 2