Reputation: 1725
I have a data frame which looks like this:
df=
['UserId','SessionId','Item_class']
[1 ,34 ,'toy' ]
[1 ,35 ,'book' ]
[2 ,36 ,'book' ]
Note that there is a 1:n relationship between UserId and SessionId as 1 user can have multiple session in which they purchase an item.
I need to find out how many unique items a user purchased in an output like this:
df=
['UserId','number_items']
[1 ,2 ]
[2 ,1 ]
I found many topics which discuss only how to get a unique value for a column
df.Item_class.unique()
but I didn't find anything that breaks that down by a sub-column, in this case, UserId.
Hope someone can help. thanks
Upvotes: 0
Views: 29
Reputation: 13387
Try this one:
>>> df.groupby("UserId").Item_class.nunique()
UserId
1 2
2 1
It counts unique Item_class
per UserID
Upvotes: 2