mnky9800n
mnky9800n

Reputation: 1193

create new columns from a list of columns in pandas

I have a pandas dataframe that has a column where the data is a list of statistics calculated from a groupby operation.

df = pd.DataFrame({'a':[1,1,1,2,2,2,3], 'b':[3,4,2,3,4,3,2]}) 
def calculate_stuff(x):
    return len(x)/5, sum(x)/len(x), sum(x)
>>> df.groupby('a').apply(lambda row : calculate_stuff(row.b))
a
1               (0, 3, 9)
2               (0, 3, 10)
3               (0, 2, 2)
dtype: object

Basically, I have several statistics that depend on each other and have to be calculated for each groupby row. The function that does this returns a tuple of the statistics values. What I want is to create a new column for each index of the tuple so that it looks like this:

a    col1    col2    col3
1    0       3       9
2    0       3       10
3    0       2       2

I don't think I can use df.groupby('a').agg because one of the calculations is required for the other calculations. Any suggestions?

edit: I realized my aggregate functions in my example were not aggregate functions so I changed them

Upvotes: 2

Views: 1125

Answers (2)

Alexander
Alexander

Reputation: 109520

Adding an extra a category item so the result is 4x3.

df = pd.DataFrame({'a': [1, 1, 1, 2, 2, 2, 3, 4], 
                   'b': [3, 4, 2, 3, 4, 3, 2, 1]})

new_cols = ['col1', 'col2', 'col3']

gb = df.groupby('a').apply(lambda group: calculate_stuff(group.b))

>>> pd.DataFrame(zip(*gb), columns=gb.index, index=new_cols).T
   col1  col2  col3
a                  
1     0     3     9
2     0     3    10
3     0     2     2
4     0     1     1

Upvotes: 1

jezrael
jezrael

Reputation: 862406

You can try list comprehension:

import pandas as pd

df = pd.DataFrame({'a':[1,1,1,2,2,2,3], 'b':[3,4,2,3,4,3,2]}) 

def calculate_stuff(x):
    return len(x)/5, sum(x)/len(x), sum(x)

group_df = df.groupby('a').apply(lambda row : calculate_stuff(row.b))

print pd.DataFrame([x for x in group_df], 
                    columns=['col1','col2','col3'], 
                    index=group_df.index)

   col1  col2  col3
a                  
1     0     3     9
2     0     3    10
3     0     2     2

Upvotes: 1

Related Questions