Reputation: 3340
I have a large 2 dimensional dataframe like this: date, user_id, val1, val2
As I need to compute complex functions for each user_id, I do the following:
for x in user_id_list :
dfi= df[ user_id= xx]
user_dict[x]['Newmycolname']= my_fun(dfi)
user_dict[x]['Newmycolname2']= my_fun2(dfi)
# map the user_dict to df after
This is not very efficient but very flexible since I can compute any kind of function on the sub-df (dfi). Also, the code can be parallel easily.... at the expense of being fast...
Is there a way to replace the loop for, by a pandas request grouby.agg and creating new column names?
Upvotes: 1
Views: 432
Reputation: 117345
Yes, you can use pandas.DataFrame.groupby
and pandas.DataFrame.apply
on each group with conversion to pandas.Series
:
>>> df.groupby('user_id')
.apply(lambda x: pd.Series(data=[my_fun(x), my_fun2(x)], index=['Newmycolname', 'Newmycolname2']))
Newmycolname Newmycolname2
user_id
1 3.5 17.0
2 6.0 20.0
Without lambda function, just to give clearer understanding what's going on:
>>> def worker(x):
... d = [my_fun(x), my_fun2(x)]
... i = ['Newmycolname', 'Newmycolname2']
... return pd.Series(data=d, index=i)
...
>>> df.groupby('user_id').apply(worker)
Newmycolname Newmycolname2
user_id
1 3.5 17.0
2 6.0 20.0
Upvotes: 1