Reputation: 27
Is there any chance to speed up the use of groupby and agrregate on large datasets?
I have dataframe like this:
User Category
A Cat
B Dog
C Cat
A Dog
I want to display all categories to each user in array, like this:
User Category
A [Cat,Dog]
B [Dog]
C [Cat]
The code I'm using for this looks like this:
df = df.groupby('User')['Category'].aggregate(
lambda x: x.unique().tolist()).reset_index()
But the processing time for large files is too long
Upvotes: 0
Views: 104
Reputation: 323226
Let us drop_duplicate
before groupby
out = df.drop_duplicates().groupby('User')['Category'].agg(list)
Out[249]:
User
A [Cat, Dog]
B [Dog]
C [Cat]
Name: Category, dtype: object
Upvotes: 1