Reputation: 1193
I have a pandas
dataframe that has a column where the data is a list of statistics calculated from a groupby operation.
df = pd.DataFrame({'a':[1,1,1,2,2,2,3], 'b':[3,4,2,3,4,3,2]})
def calculate_stuff(x):
return len(x)/5, sum(x)/len(x), sum(x)
>>> df.groupby('a').apply(lambda row : calculate_stuff(row.b))
a
1 (0, 3, 9)
2 (0, 3, 10)
3 (0, 2, 2)
dtype: object
Basically, I have several statistics that depend on each other and have to be calculated for each groupby row. The function that does this returns a tuple of the statistics values. What I want is to create a new column for each index of the tuple so that it looks like this:
a col1 col2 col3
1 0 3 9
2 0 3 10
3 0 2 2
I don't think I can use df.groupby('a').agg
because one of the calculations is required for the other calculations. Any suggestions?
edit: I realized my aggregate functions in my example were not aggregate functions so I changed them
Upvotes: 2
Views: 1125
Reputation: 109520
Adding an extra a
category item so the result is 4x3.
df = pd.DataFrame({'a': [1, 1, 1, 2, 2, 2, 3, 4],
'b': [3, 4, 2, 3, 4, 3, 2, 1]})
new_cols = ['col1', 'col2', 'col3']
gb = df.groupby('a').apply(lambda group: calculate_stuff(group.b))
>>> pd.DataFrame(zip(*gb), columns=gb.index, index=new_cols).T
col1 col2 col3
a
1 0 3 9
2 0 3 10
3 0 2 2
4 0 1 1
Upvotes: 1
Reputation: 862406
You can try list
comprehension:
import pandas as pd
df = pd.DataFrame({'a':[1,1,1,2,2,2,3], 'b':[3,4,2,3,4,3,2]})
def calculate_stuff(x):
return len(x)/5, sum(x)/len(x), sum(x)
group_df = df.groupby('a').apply(lambda row : calculate_stuff(row.b))
print pd.DataFrame([x for x in group_df],
columns=['col1','col2','col3'],
index=group_df.index)
col1 col2 col3
a
1 0 3 9
2 0 3 10
3 0 2 2
Upvotes: 1