Reputation: 875
I have dataframe that looks like this:
Auction_id bid_price min_bid rank
123 5 3 1
123 4 3 2
124 3 2 1
124 1 2 2
I'd like to create another column that returns MAX(rank 1 min_bid, rank 2 bid_price). I don't care what appears for the rank 2 column values. I'm hoping for the result to look something like this:
Auction_id bid_price min_bid rank custom_column
123 5 3 1 4
123 4 3 2 NaN/Don't care
124 3 2 1 2
124 1 2 2 NaN/Don't care
Should I be iterating through grouped auction_ids? Can someone provide the topics one would need to be familiar with to tackle this type of problem?
Upvotes: 0
Views: 91
Reputation: 8493
Here's an approach that does some reshaping with pivot()
Auction_id bid_price min_bid rank
0 123 5 3 1
1 123 4 3 2
2 124 3 2 1
3 124 1 2 2
Then reshape your frame (df)
pv = df.pivot("Auction_id","rank")
pv
bid_price min_bid
rank 1 2 1 2
Auction_id
123 5 4 3 3
124 3 1 2 2
Adding a column to pv that contains the max. I"m using iloc to get a slice of the pv dataframe.
pv["custom_column"] = pv.iloc[:,[1,2]].max(axis=1)
pv
bid_price min_bid custom_column
rank 1 2 1 2
Auction_id
123 5 4 3 3 4
124 3 1 2 2 2
and then add the max to the original frame (df) by mapping to our pv frame
df.loc[df["rank"] == 1,"custom_column"] = df["Auction_id"].map(pv["custom_column"])
df
Auction_id bid_price min_bid rank custom_column
0 123 5 3 1 4
1 123 4 3 2 NaN
2 124 3 2 1 2
3 124 1 2 2 NaN
all the steps combined
pv = df.pivot("Auction_id","rank")
pv["custom_column"] = pv.iloc[:,[1,2]].max(axis=1)
df.loc[df["rank"] == 1,"custom_column"] = df["Auction_id"].map(pv["custom_column"])
df
Auction_id bid_price min_bid rank custom_column
0 123 5 3 1 4
1 123 4 3 2 NaN
2 124 3 2 1 2
3 124 1 2 2 NaN
Upvotes: 2
Reputation: 109726
First, set the index equal to the Auction_id
. Then you can use loc
to select the appropriate values for each Auction_id
and use max on their values. Finally, reset your index to return to your initial state.
df.set_index('Auction_id', inplace=True)
df['custom_column'] = pd.concat([df.loc[df['rank'] == 1, 'min_bid'],
df.loc[df['rank'] == 2, 'bid_price']],
axis=1).max(axis=1)
df.reset_index(inplace=True)
>>> df
Auction_id bid_price min_bid rank custom_column
0 123 5 3 1 4
1 123 4 3 2 4
2 124 3 2 1 2
3 124 1 2 2 2
Upvotes: 2
Reputation: 77027
Here's one crude way to do it.
Create maxminbid()
function, which creates a val=
MAX(rank 1 min_bid, rank 2 bid_price) and assign this to grp['custom_column']
, and for rank==2
store it with NaN
def maxminbid(grp):
val = max(grp.loc[grp['rank']==1, 'min_bid'].values,
grp.loc[grp['rank']==2, 'bid_price'].values)[0]
grp['custom_column'] = val
grp.loc[grp['rank']==2, 'custom_column'] = pd.np.nan
return grp
Then apply maxminbid
function on Auction_id
grouped objects
df.groupby('Auction_id').apply(maxminbid)
Auction_id bid_price min_bid rank custom_column
0 123 5 3 1 4
1 123 4 3 2 NaN
2 124 3 2 1 2
3 124 1 2 2 NaN
But, I suspect, there must be some elegant solution than this one.
Upvotes: 2