Reputation: 291
I need to select only users that have purchased more than x items
For example I have a table where :
users items
-----------------------
1 a
1 b
1 c
1 d
2 N
2 M
I want to have only users who have purchased more than 3 items. In the case exemple I would have only user 1 and not user 2 who have purchased only 2 items.
For now I am using this condition in my sql script which output is the input of my python script but since it takes a long time I was hoping to do it directly in my python script.
Thank you in advance.
Regards,
Upvotes: 1
Views: 134
Reputation: 862651
You need filter:
df = df.groupby('users').filter(lambda x: len(x['items']) > 3)
print (df)
users items
0 1 a
1 1 b
2 1 c
3 1 d
Another solution with transform
and boolean indexing
:
df = df[df.groupby('users')['items'].transform('size') > 3]
print (df)
users items
0 1 a
1 1 b
2 1 c
3 1 d
Upvotes: 1
Reputation: 21552
You can groupby by users
and then count the number of items purchased:
In [8]: d = df.groupby('users').size().reset_index(name='items_purchased')
In [9]: d
Out[9]:
users items_purchased
0 1 4
1 2 2
and then filter by a mask:
In [10]: fil = 3
In [11]: d = d[d['items_purchased']>fil]
In [12]: d
Out[12]:
users items_purchased
0 1 4
Upvotes: 0