Amy21
Amy21

Reputation: 291

Python : choose only users that have purchased more than x element

I need to select only users that have purchased more than x items

For example I have a table where :

    users       items
-----------------------
    1            a
    1            b 
    1            c
    1            d 
    2            N  
    2            M

I want to have only users who have purchased more than 3 items. In the case exemple I would have only user 1 and not user 2 who have purchased only 2 items.

For now I am using this condition in my sql script which output is the input of my python script but since it takes a long time I was hoping to do it directly in my python script.

Thank you in advance.

Regards,

Upvotes: 1

Views: 134

Answers (2)

jezrael
jezrael

Reputation: 862651

You need filter:

df = df.groupby('users').filter(lambda x: len(x['items']) > 3)
print (df)
   users items
0      1     a
1      1     b
2      1     c
3      1     d

Another solution with transform and boolean indexing:

df = df[df.groupby('users')['items'].transform('size') > 3]
print (df)
   users items
0      1     a
1      1     b
2      1     c
3      1     d

Upvotes: 1

Fabio Lamanna
Fabio Lamanna

Reputation: 21552

You can groupby by users and then count the number of items purchased:

In [8]: d = df.groupby('users').size().reset_index(name='items_purchased')

In [9]: d
Out[9]: 
   users  items_purchased
0      1                4
1      2                2

and then filter by a mask:

In [10]: fil = 3

In [11]: d = d[d['items_purchased']>fil]

In [12]: d
Out[12]: 
   users  items_purchased
0      1                4

Upvotes: 0

Related Questions