Reputation: 803
I have a dataframe with user_id
and order_number
columns. order_number
tells the nth order by a user. I want to select users who have done certain number of orders.
Sample DataFrame:
user_id order_number
0 1 1
1 1 2
2 1 3
3 2 1
4 2 2
5 3 1
6 3 2
7 3 3
Output: [1,3]
Output should be user_id
[1,3]
because they have done 3 orders but 2 have done only 2 orders.
I am trying:
(df.groupby(['user_id'])['order_number'].max()==3)
This gives me boolean series but how to select index with only True values?
Upvotes: 1
Views: 298
Reputation: 75080
A general way of doing this is by using df.loc[]
or df.query
:
df.groupby(['user_id'],as_index=False)['order_number'].max().query("order_number==3")
#or
df.groupby(['user_id'],as_index=False)['order_number'].max().loc[
lambda x: x['order_number']==3]
For this example you dont have to get max of another column, you can just count them, as @Steven suggests :
df.groupby('user_id',as_index=False).count().query("order_number==3")
Or as @Wen suggests:
df['user_id'].value_counts().loc[lambda x: x==3]
user_id order_number
0 1 3
2 3 3
Upvotes: 6