Reputation: 275
I would like to create a groupby dataframe and group row in multiple lists : I have the following dataframe :
list_date = [1,1,2,2,2,3,3,3,4,5]
list_request = [2,2,2,3,3,2,3,3,3,3]
list_users = [1,3,7,1,7,3,4,9,7,9]
list_count = [1,1,2,3,1,3,1,2,1,1]
df = pd.DataFrame({'date':list_date,'request':list_request,'users':list_users, 'count':list_count})
count date request users
0 1 1 2 1
1 1 1 2 3
2 2 2 2 7
3 3 2 3 1
4 1 2 3 7
5 3 3 2 3
6 1 3 3 4
7 2 3 3 9
8 1 4 3 7
9 1 5 3 9
and I want to have the following result (grouping by ['date','request']):
count date request users
0 [1, 1] 1 2 [1, 3]
1 [2] 2 2 [7]
2 [3, 1] 2 3 [1, 7]
3 [3] 3 2 [3]
4 [1, 2] 3 3 [4, 9]
5 [1] 4 3 [7]
6 [1] 5 3 [9]
If I use the solution here : grouping rows in list in pandas groupby I can create 1 list but how can I create 2 lists at the same time ? Something like :
df = df.groupby(['date','request'])['users','count'].apply(list).reset_index()
And I need something efficient as in reality I have millions of rows...
Thank you!
Upvotes: 0
Views: 250
Reputation: 402814
You can apply pd.Series
and then rename your columns:
In [299]: df.groupby(['date', 'request']).apply(lambda x: pd.Series((x['count'].tolist(), x['users'].tolist()))).reset_index().rename(columns={0 : 'count', 1 : 'users'})
Out[299]:
date request count users
0 1 2 [1, 1] [1, 3]
1 2 2 [2] [7]
2 2 3 [3, 1] [1, 7]
3 3 2 [3] [3]
4 3 3 [1, 2] [4, 9]
5 4 3 [1] [7]
6 5 3 [1] [9]
Another solution is applying list with df.apply
:
In [364]: grouped_df = df.groupby(['date', 'request'])
In [365]: pd.DataFrame({ 'count' : grouped_df['count'].apply(list), 'users' : grouped_df['users'].apply(list) }).reset_index() #, columns=['count', 'users'])
Out[365]:
date request count users
0 1 2 [1, 1] [1, 3]
1 2 2 [2] [7]
2 2 3 [3, 1] [1, 7]
3 3 2 [3] [3]
4 3 3 [1, 2] [4, 9]
5 4 3 [1] [7]
6 5 3 [1] [9]
Better solutions may be seen here (I asked a question): Converting each grouped column in DataFrameGroupBy object to a list
Upvotes: 1