create new columns from a list of columns in pandas

Question

I have a pandas dataframe that has a column where the data is a list of statistics calculated from a groupby operation.

df = pd.DataFrame({'a':[1,1,1,2,2,2,3], 'b':[3,4,2,3,4,3,2]}) 
def calculate_stuff(x):
    return len(x)/5, sum(x)/len(x), sum(x)
>>> df.groupby('a').apply(lambda row : calculate_stuff(row.b))
a
1               (0, 3, 9)
2               (0, 3, 10)
3               (0, 2, 2)
dtype: object

Basically, I have several statistics that depend on each other and have to be calculated for each groupby row. The function that does this returns a tuple of the statistics values. What I want is to create a new column for each index of the tuple so that it looks like this:

a    col1    col2    col3
1    0       3       9
2    0       3       10
3    0       2       2

I don't think I can use df.groupby('a').agg because one of the calculations is required for the other calculations. Any suggestions?

edit: I realized my aggregate functions in my example were not aggregate functions so I changed them

Alexander · Accepted Answer

Adding an extra a category item so the result is 4x3.

df = pd.DataFrame({'a': [1, 1, 1, 2, 2, 2, 3, 4], 
                   'b': [3, 4, 2, 3, 4, 3, 2, 1]})

new_cols = ['col1', 'col2', 'col3']

gb = df.groupby('a').apply(lambda group: calculate_stuff(group.b))

>>> pd.DataFrame(zip(*gb), columns=gb.index, index=new_cols).T
   col1  col2  col3
a                  
1     0     3     9
2     0     3    10
3     0     2     2
4     0     1     1

create new columns from a list of columns in pandas

Answers (2)

Related Questions