Daveed
Daveed

Reputation: 149

Using Pandas groupby how can you aggregate a column of lists using addition?

I have a dataframe with a column that contains a list of values. Each row in the dataframe has a list of the same length. I'd like to use Dataframe.groupby to group the data in the dataframe and sum together the lists in the following fashion:

In:

import pandas as pd

#Sample data
a = pd.DataFrame([['a', 'test', list([0,1,2,3,4])],['b', 'test', list([5,6,7,8,9])]], columns=['id', 'grp', 'values'])
print(a)

#Some function to group the dataframe
#b = a.groupby('grp').someAggregationFunction()

#Example of desired output
b = pd.DataFrame([['test', list([5,7,9,11,13])]], columns=['grp', 'values'])
    print(b)

Out:

  id   grp           values
0  a  test  [0, 1, 2, 3, 4]
1  b  test  [5, 6, 7, 8, 9]

    grp             values
0  test  [5, 7, 9, 11, 13]

Upvotes: 3

Views: 289

Answers (4)

BENY
BENY

Reputation: 323326

Push it into one line

a.groupby('grp')['values'].apply(lambda x : pd.DataFrame(x.values.tolist()).sum().tolist())
Out[286]: 
grp
test    [5, 7, 9, 11, 13]
Name: values, dtype: object

Also I recommend do not using apply here

b=pd.DataFrame(a['values'].values.tolist()).groupby(a['grp']).sum()
pd.DataFrame({'grp':b.index,'values':b.values.tolist()})
Out[293]: 
    grp             values
0  test  [5, 7, 9, 11, 13]

Upvotes: 2

jpp
jpp

Reputation: 164773

You may not like this answer, but it's better not to use lists in dataframes. You should seek, wherever possible, to use numeric series for numeric data:

res = df.join(pd.DataFrame(df.pop('values').tolist()))\
        .groupby('grp').sum().reset_index()

print(res)

    grp  0  1  2   3   4
0  test  5  7  9  11  13

Upvotes: 3

user3483203
user3483203

Reputation: 51165

Using numpy.stack:

pd.DataFrame(
    [(i, np.stack(g).sum(0)) for i, g in a.groupby('grp')['values']],
    columns=['grp', 'values']
)

    grp             values
0  test  [5, 7, 9, 11, 13]

Also using apply, but apply will be slow:

a.groupby('grp')['values'].apply(lambda x: np.stack(x).sum(0)).to_frame('values')

                 values
grp
test  [5, 7, 9, 11, 13]

Upvotes: 1

rafaelc
rafaelc

Reputation: 59274

One solution is to transform your lists into np.arrays and use simple sum

a['v'] = a.v.transform(np.array)
a.groupby('grp').v.apply(lambda x: x.sum())

    grp     v
0   test    [5, 7, 9, 11, 13]

Notice that I changed values to v not to be mistaken with the .values accessor from pd.DataFrame

Upvotes: 1

Related Questions