Sean
Sean

Reputation: 99

Find Duplicates in One Column, Compare Another Column, Modify a Third Column in DataFrame

Rather than explain this in a use case agnostic way I will provide the columns names as this is much clearer.

I have three columns: PlayerName, Salary, Position.

An example:

PlayerName, Salary, Position
Joe, 3000, FWD
Joe, 4500, FWD
Bill, 3200, CNT
Bill, 2000, CNT
Jill, 1200, GRD
Jill, 2200, GRD

I need to find where the names are the same, and then rename the Position of the row with the higher Salary to CPT ([the original Position value])

The thing I am the most stuck on is how to do a same column comparison not just to find the duplicates (like using np.isin) but to do the comparison coming of of that.

In this example the output would be:

Joe, 3000, FWD
Joe, 4500, CPT (FWD)
Bill, 3200, CPT (CNT)
Bill, 2000, CNT
Jill, 1200, GRD
Jill, 2200, CPT (GRD)

Upvotes: 0

Views: 65

Answers (2)

user3483203
user3483203

Reputation: 51155

You say that there are always 2 duplicate entries, so you can simply use idxmax + loc:

m = df.groupby('PlayerName')['Salary'].idxmax()
df.loc[m, 'Position'] = 'CPT' + df.loc[m, 'Position'].map(' ({})'.format)

  PlayerName  Salary   Position
0        Joe    3000        FWD
1        Joe    4500  CPT (FWD)
2       Bill    3200  CPT (CNT)
3       Bill    2000        CNT
4       Jill    1200        GRD
5       Jill    2200  CPT (GRD)

Upvotes: 2

Sven Harris
Sven Harris

Reputation: 2939

You can find the max salary column by each player, you can then match the index (this will give you True when you are in the row with the maximum salary for the player) and update the string value of position for those indices. Try the below:

player_df = pd.DataFrame(columns=["PlayerName", "Salary", "Position"], data=[
["Joe", 3000, "FWD"],
["Joe", 4500, "FWD"],
["Bill", 3200, "CNT"],
["Bill", 2000, "CNT"],
["Jill", 1200, "GRD"],
["Jill", 2200, "GRD"]])

player_df["top_sal"] = player_df.groupby("PlayerName")["Salary"].transform("max")
player_df.loc[player_df["top_sal"] == player_df["Salary"], "Position"] = "CPT (" + player_df["Position"] + ")"

Upvotes: 2

Related Questions