Arthur Zangiev
Arthur Zangiev

Reputation: 1526

Python pandas dataframe groupby selecting columns

I have the following detaframe bb:

bq_selection_id bq_balance  bq_market_id  bq_back_price
0         45094462     185.04       7278437           1.97
1         45094462     185.04       7278437           1.97
2         45094463     185.04       7278437           3.05
3         45094463     185.04       7278437           3.05
4         45094464     185.04       7278437           5.80
5         45094464     185.04       7278437           5.80
6         45094466     185.04       7278437         200.00
7         45094466     185.04       7278437         200.00
8         45094465     185.04       7278437            NaN
9         45094465     185.04       7278437            NaN

I would like to group by "market_id" and take first two lowest "bq_back_price". I managed to do this with

bb.groupby('bq_market_id')['bq_back_price'].nsmallest(2)

The problem is that I am missing some of the columns such as "bq_selection_id", "bq_balance" and column "bq_back_price" does not have name. That is what I get

bq_market_id   
7278437       0    1.97
7278437       1    1.97

And I would like to get something like this

bq_selection_id bq_balance  bq_market_id  bq_back_price
0         45094462     185.04       7278437           1.97
1         45094462     185.04       7278437           1.97

Can you help me please?

Upvotes: 1

Views: 3100

Answers (3)

jezrael
jezrael

Reputation: 862691

You can use merge by indexes:

print bb.groupby('bq_market_id')['bq_back_price'].nsmallest(2).reset_index(level=0, name='bq_back_price')
   bq_market_id  bq_back_price
0       7278437           1.97
1       7278437           1.97

print pd.merge(bb[['bq_selection_id','bq_balance']],
               bb.groupby('bq_market_id')['bq_back_price'].nsmallest(2).reset_index(level=0, name='bq_back_price'),
               left_index=True,
               right_index=True)
   bq_selection_id  bq_balance  bq_market_id  bq_back_price
0         45094462      185.04       7278437           1.97
1         45094462      185.04       7278437           1.97

unutbu deleted nice answer, but I think it is better as my answer:

result = df.groupby('bq_market_id')['bq_back_price'].nsmallest(2)
idx = result.index.get_level_values(-1)
print(df.loc[idx])

   bq_selection_id  bq_balance  bq_market_id  bq_back_price
0         45094462      185.04       7278437           1.97
1         45094462      185.04       7278437           1.97

Upvotes: 0

wij
wij

Reputation: 1302

How about appending a new "Rank" column?

bb['rank'] = bb.groupby(['bq_market_id'])['bq_back_price'].rank(ascending=True)

After that, you can filter bb for the lowest 2 prices (rank 1 and 2).

bb[bb['rank'] < 3]

Credit to: python pandas rank by column

Upvotes: 1

Zero
Zero

Reputation: 76927

You can first sort values on bq_back_price, and then take head(2) in each groups.

In [218]: df.sort_values('bq_back_price').groupby('bq_market_id').head(2)
Out[218]:
   bq_selection_id  bq_balance  bq_market_id  bq_back_price
0         45094462      185.04       7278437           1.97
1         45094462      185.04       7278437           1.97

Upvotes: 2

Related Questions