Christopher Jenkins
Christopher Jenkins

Reputation: 875

Pandas MAX formula across different grouped rows

I have dataframe that looks like this:

Auction_id  bid_price  min_bid  rank
123         5          3        1
123         4          3        2
124         3          2        1
124         1          2        2

I'd like to create another column that returns MAX(rank 1 min_bid, rank 2 bid_price). I don't care what appears for the rank 2 column values. I'm hoping for the result to look something like this:

Auction_id  bid_price  min_bid  rank  custom_column
123         5          3        1     4
123         4          3        2     NaN/Don't care
124         3          2        1     2
124         1          2        2     NaN/Don't care

Should I be iterating through grouped auction_ids? Can someone provide the topics one would need to be familiar with to tackle this type of problem?

Upvotes: 0

Views: 91

Answers (3)

Bob Haffner
Bob Haffner

Reputation: 8493

Here's an approach that does some reshaping with pivot()

   Auction_id  bid_price  min_bid  rank  
0         123          5        3     1              
1         123          4        3     2           
2         124          3        2     1            
3         124          1        2     2   

Then reshape your frame (df)

pv = df.pivot("Auction_id","rank")
pv
                   bid_price    min_bid   
rank               1    2       1  2
Auction_id                        
123                5    4       3  3
124                3    1       2  2

Adding a column to pv that contains the max. I"m using iloc to get a slice of the pv dataframe.

    pv["custom_column"]  = pv.iloc[:,[1,2]].max(axis=1)
    pv

                 bid_price    min_bid    custom_column
rank               1  2       1  2              
Auction_id                                      
123                5  4       3  3             4
124                3  1       2  2             2

and then add the max to the original frame (df) by mapping to our pv frame

df.loc[df["rank"] == 1,"custom_column"] = df["Auction_id"].map(pv["custom_column"])
df

  Auction_id  bid_price  min_bid  rank  custom_column
0         123          5        3     1              4
1         123          4        3     2            NaN
2         124          3        2     1              2
3         124          1        2     2            NaN

all the steps combined

pv = df.pivot("Auction_id","rank")
pv["custom_column"] = pv.iloc[:,[1,2]].max(axis=1)
df.loc[df["rank"] == 1,"custom_column"] = df["Auction_id"].map(pv["custom_column"])
df

  Auction_id  bid_price  min_bid  rank  custom_column
0         123          5        3     1              4
1         123          4        3     2            NaN
2         124          3        2     1              2
3         124          1        2     2            NaN

Upvotes: 2

Alexander
Alexander

Reputation: 109726

First, set the index equal to the Auction_id. Then you can use loc to select the appropriate values for each Auction_id and use max on their values. Finally, reset your index to return to your initial state.

df.set_index('Auction_id', inplace=True)
df['custom_column'] = pd.concat([df.loc[df['rank'] == 1, 'min_bid'],
                                 df.loc[df['rank'] == 2, 'bid_price']],
                                axis=1).max(axis=1)
df.reset_index(inplace=True)
>>> df
   Auction_id  bid_price  min_bid  rank  custom_column
0         123          5        3     1              4
1         123          4        3     2              4
2         124          3        2     1              2
3         124          1        2     2              2

Upvotes: 2

Zero
Zero

Reputation: 77027

Here's one crude way to do it.

Create maxminbid() function, which creates a val= MAX(rank 1 min_bid, rank 2 bid_price) and assign this to grp['custom_column'], and for rank==2 store it with NaN

def maxminbid(grp):
    val = max(grp.loc[grp['rank']==1, 'min_bid'].values,
              grp.loc[grp['rank']==2, 'bid_price'].values)[0]
    grp['custom_column'] = val
    grp.loc[grp['rank']==2, 'custom_column'] = pd.np.nan
    return grp

Then apply maxminbid function on Auction_id grouped objects

df.groupby('Auction_id').apply(maxminbid)


   Auction_id  bid_price  min_bid  rank  custom_column
0         123          5        3     1              4
1         123          4        3     2            NaN
2         124          3        2     1              2
3         124          1        2     2            NaN

But, I suspect, there must be some elegant solution than this one.

Upvotes: 2

Related Questions