Reputation: 1526
I have the following detaframe bb:
bq_selection_id bq_balance bq_market_id bq_back_price
0 45094462 185.04 7278437 1.97
1 45094462 185.04 7278437 1.97
2 45094463 185.04 7278437 3.05
3 45094463 185.04 7278437 3.05
4 45094464 185.04 7278437 5.80
5 45094464 185.04 7278437 5.80
6 45094466 185.04 7278437 200.00
7 45094466 185.04 7278437 200.00
8 45094465 185.04 7278437 NaN
9 45094465 185.04 7278437 NaN
I would like to group by "market_id" and take first two lowest "bq_back_price". I managed to do this with
bb.groupby('bq_market_id')['bq_back_price'].nsmallest(2)
The problem is that I am missing some of the columns such as "bq_selection_id", "bq_balance" and column "bq_back_price" does not have name. That is what I get
bq_market_id
7278437 0 1.97
7278437 1 1.97
And I would like to get something like this
bq_selection_id bq_balance bq_market_id bq_back_price
0 45094462 185.04 7278437 1.97
1 45094462 185.04 7278437 1.97
Can you help me please?
Upvotes: 1
Views: 3100
Reputation: 862691
You can use merge
by indexes
:
print bb.groupby('bq_market_id')['bq_back_price'].nsmallest(2).reset_index(level=0, name='bq_back_price')
bq_market_id bq_back_price
0 7278437 1.97
1 7278437 1.97
print pd.merge(bb[['bq_selection_id','bq_balance']],
bb.groupby('bq_market_id')['bq_back_price'].nsmallest(2).reset_index(level=0, name='bq_back_price'),
left_index=True,
right_index=True)
bq_selection_id bq_balance bq_market_id bq_back_price
0 45094462 185.04 7278437 1.97
1 45094462 185.04 7278437 1.97
unutbu deleted nice answer, but I think it is better as my answer:
result = df.groupby('bq_market_id')['bq_back_price'].nsmallest(2)
idx = result.index.get_level_values(-1)
print(df.loc[idx])
bq_selection_id bq_balance bq_market_id bq_back_price
0 45094462 185.04 7278437 1.97
1 45094462 185.04 7278437 1.97
Upvotes: 0
Reputation: 1302
How about appending a new "Rank" column?
bb['rank'] = bb.groupby(['bq_market_id'])['bq_back_price'].rank(ascending=True)
After that, you can filter bb for the lowest 2 prices (rank 1 and 2).
bb[bb['rank'] < 3]
Credit to: python pandas rank by column
Upvotes: 1
Reputation: 76927
You can first sort values on bq_back_price
, and then take head(2)
in each groups.
In [218]: df.sort_values('bq_back_price').groupby('bq_market_id').head(2)
Out[218]:
bq_selection_id bq_balance bq_market_id bq_back_price
0 45094462 185.04 7278437 1.97
1 45094462 185.04 7278437 1.97
Upvotes: 2