Reputation: 91

pandas dataframe: how to group by value?

I am working with the following pandas dataframe (from csv file):

----------------------------------------------
buyer_id       itemX   itemY      bid1  bid2
---------------------------------------------- 
buyer1         item1   item2       52   32 
buyer2         item1   item2       15   17 
.. .. 
buyer500       item1   item2       82   13

I have 500 buyers bidding for item1 and item2, the following 200 buyers bid for item3 and item4

buyer600       item3   item4       63   82
..
..
buyer800       item3   item4       40   12

I sum Bid1 and bid2 so I can sort to find max bids, Now I want to find and keep the max 5 bids only for each group.. group1(item 1 + item2) and group2(item3 + item4). How can I do it?

Here is my code so far,

import pandas as pd

df = pd.read)csv('myfile.csv')
sum = (df[df.columns[4:]].sum(1))
df['sum'] = calc
df.sort_values(['sort of sum'], ascending=[False], inplace=True)
df.to_csv('results.csv')

output should be something like

-----------------------------------------
buyer    itemX   itemY   sum of bid 
---------------------------------‌-------- 
buyer13  item1   item2     350 
buyer2   item1   item2     283 
buyer65  item1   item2     236         
buyer602 item3   item4     80 
buyer703 item3   item4     76 
buyer640 item3   item4     69

Upvotes: 0

Answers (3)

Patrick Hingston

Reputation: 292

One way to do this would be to create two dataframes, one for the item1 & 2 buyers, and another for the item3 & 4 buyers

df1 = df[df['itemX'] == 'item1']
df2 = df[df['itemX'] == 'item3']

Then you can create a new column which sums the bids

df1['sum_bids'] = df1['bid1'] + df1['bid2']

Then sort the dataframes using sort_values

sorted_df1 = df1.sort_values(['sum_bids'], ascending=False)

Then you can re-index them

sorted_df1.index = range(1,len(sorted_df1) + 1)

Then select first 5 rows

max_bids = sorted_df1[:5]

Upvotes: 2

Allen Qin

Reputation: 19947

#create a total column, sort by item name and total bid, then take the top 5 for each group.
df.assign(total = df.bid1+df.bid2).sort_values(['itemX','total'],ascending=False).groupby('itemX').head(5)
Out[2566]: 
    buyer_id  itemX  itemY  bid1  bid2  total
12   buyer61  item3  item4    44    60    104
11   buyer51  item3  item4    52    32     84
8    buyer21  item3  item4    15    55     70
10   buyer41  item3  item4    40    17     57
9    buyer31  item3  item4    52     3     55
6   buyer500  item1  item2    82    13     95
0     buyer1  item1  item2    52    32     84
2     buyer3  item1  item2    52    32     84
4     buyer5  item1  item2    52    32     84
5     buyer6  item1  item2    15    60     75

Upvotes: 1

Librarian

Reputation: 198

First I'm going to make a sample dataframe that I think is similar to the one you describe.

bids_df = pd.DataFrame({'buyer_id': ['buyer' + str(i) for i in range(1, 501)] + ['buyer' + str(i) for i in range(600, 800)],
                    'itemX': ['item1'] * 500 + ['item3'] * 200,
                    'itemY': ['item2'] * 500 + ['item4'] * 200,
                    'bid1': [randint(10, 100) for _ in range(700)],
                    'bid2': [randint(10, 100) for _ in range(700)]})
bids_df = bids_df[['buyer_id', 'itemX', 'itemY', 'bid1', 'bid2']]

Then we can add a column for the sum of bids and sort: you've done this much in your question already.

bids_df['bid_sum'] = bids_df['bid1'] + bids_df['bid2']
bids_df = bids_df.sort_values(by=['bid_sum'], ascending=False)

Finally we can group the dataframe by item as you describe, and simply get the top 5 rows from each group (since we've already sorted them by bid sum).

bids_df.groupby(lambda x: 'grp1' if bids_df.loc[x, 'itemX'] == 'item1' else 'grp2')\
    .head(5)\
    .sort_values(by=['itemX', 'bid_sum'], ascending=[True, False])

This gives the resulting dataframe:

     buyer_id  itemX  itemY  bid1  bid2  bid_sum
60    buyer61  item1  item2    99    97      196
498  buyer499  item1  item2    98    97      195
470  buyer471  item1  item2    92    99      191
120  buyer121  item1  item2    98    93      191
50    buyer51  item1  item2   100    91      191
573  buyer673  item3  item4   100    94      194
639  buyer739  item3  item4    90    95      185
512  buyer612  item3  item4    89    94      183
691  buyer791  item3  item4   100    78      178
659  buyer759  item3  item4    87    91      178

Upvotes: 1

pandas dataframe: how to group by value?

Answers (3)

Related Questions