tensor
tensor

Reputation: 3340

Using dictionary for dataframe aggregation

I have a large 2 dimensional dataframe like this: date, user_id, val1, val2

As I need to compute complex functions for each user_id, I do the following:

for x in user_id_list :
    dfi= df[ user_id= xx]    
    user_dict[x]['Newmycolname']=  my_fun(dfi)
    user_dict[x]['Newmycolname2']=  my_fun2(dfi)

# map the user_dict to df after

This is not very efficient but very flexible since I can compute any kind of function on the sub-df (dfi). Also, the code can be parallel easily.... at the expense of being fast...

Is there a way to replace the loop for, by a pandas request grouby.agg and creating new column names?

Upvotes: 1

Views: 432

Answers (1)

roman
roman

Reputation: 117345

Yes, you can use pandas.DataFrame.groupby and pandas.DataFrame.apply on each group with conversion to pandas.Series:

>>> df.groupby('user_id')
      .apply(lambda x: pd.Series(data=[my_fun(x), my_fun2(x)], index=['Newmycolname', 'Newmycolname2']))
         Newmycolname  Newmycolname2
user_id                             
1                 3.5           17.0
2                 6.0           20.0

Without lambda function, just to give clearer understanding what's going on:

>>> def worker(x):
...     d = [my_fun(x), my_fun2(x)]
...     i = ['Newmycolname', 'Newmycolname2']
...     return pd.Series(data=d, index=i)
... 
>>> df.groupby('user_id').apply(worker)
         Newmycolname  Newmycolname2
user_id                             
1                 3.5           17.0
2                 6.0           20.0

Upvotes: 1

Related Questions