Reputation: 149
I have a dataframe with a column that contains a list of values. Each row in the dataframe has a list of the same length. I'd like to use Dataframe.groupby to group the data in the dataframe and sum together the lists in the following fashion:
In:
import pandas as pd
#Sample data
a = pd.DataFrame([['a', 'test', list([0,1,2,3,4])],['b', 'test', list([5,6,7,8,9])]], columns=['id', 'grp', 'values'])
print(a)
#Some function to group the dataframe
#b = a.groupby('grp').someAggregationFunction()
#Example of desired output
b = pd.DataFrame([['test', list([5,7,9,11,13])]], columns=['grp', 'values'])
print(b)
Out:
id grp values
0 a test [0, 1, 2, 3, 4]
1 b test [5, 6, 7, 8, 9]
grp values
0 test [5, 7, 9, 11, 13]
Upvotes: 3
Views: 289
Reputation: 323326
Push it into one line
a.groupby('grp')['values'].apply(lambda x : pd.DataFrame(x.values.tolist()).sum().tolist())
Out[286]:
grp
test [5, 7, 9, 11, 13]
Name: values, dtype: object
Also I recommend do not using apply
here
b=pd.DataFrame(a['values'].values.tolist()).groupby(a['grp']).sum()
pd.DataFrame({'grp':b.index,'values':b.values.tolist()})
Out[293]:
grp values
0 test [5, 7, 9, 11, 13]
Upvotes: 2
Reputation: 164773
You may not like this answer, but it's better not to use lists in dataframes. You should seek, wherever possible, to use numeric series for numeric data:
res = df.join(pd.DataFrame(df.pop('values').tolist()))\
.groupby('grp').sum().reset_index()
print(res)
grp 0 1 2 3 4
0 test 5 7 9 11 13
Upvotes: 3
Reputation: 51165
Using numpy.stack
:
pd.DataFrame(
[(i, np.stack(g).sum(0)) for i, g in a.groupby('grp')['values']],
columns=['grp', 'values']
)
grp values
0 test [5, 7, 9, 11, 13]
Also using apply
, but apply
will be slow:
a.groupby('grp')['values'].apply(lambda x: np.stack(x).sum(0)).to_frame('values')
values
grp
test [5, 7, 9, 11, 13]
Upvotes: 1
Reputation: 59274
One solution is to transform
your lists
into np.arrays
and use simple sum
a['v'] = a.v.transform(np.array)
a.groupby('grp').v.apply(lambda x: x.sum())
grp v
0 test [5, 7, 9, 11, 13]
Notice that I changed values
to v
not to be mistaken with the .values
accessor from pd.DataFrame
Upvotes: 1