Reputation: 95
I have a dataframe in the following format:
UserId, CurrentUserLocationId, RegisteredUserLocationId, RestorauntId
I wish to count the amount of unique appearances of the key (UserId, CurrentUserLocationId, RegisteredUserLocationId)
For example, if the pair (1, 1, 1)
appears once, I wish to stop counting and include it in the final result. So each unique pair that appears I need to count it only once.
What I tried doing is to use groupby(['col1', 'col2', 'col3']).size()
however this counts all the records. The dataset I will be using the code on has billion records.
Is there a built-in way to accomplish what I'm trying to do? Or to be more precise, what's the fastest way to do this sort of counting?
Upvotes: 0
Views: 744
Reputation: 638
DataFrame.drop_duplicates()
DataFrame.count
If necessary duplicate the dataframe before dropping duplicates and when making the duplicate dataframe only call in the columns you want to be unique combinations.
Upvotes: 2