Ajay Chinni
Ajay Chinni

Reputation: 850

Pandas groupby aggregate list

I have this data frame

df  = pd.DataFrame({'upc':[1,1,1],'store':[1,2,3],'date':['jan','jan','jan'],'pred':[[1,1,1],[2,2,2],[3,3,3]],'act':[[4,4,4],[5,5,5],[6,6,6]]})

looks like this

   upc  store date       pred        act
0    1      1  jan  [1, 1, 1]  [4, 4, 4]
1    1      2  jan  [2, 2, 2]  [5, 5, 5]
2    1      3  jan  [3, 3, 3]  [6, 6, 6]

When I do groupby and agg along store for pred and act

df.groupby(by = ["upc","date"]).agg({"pred":"sum","act":"sum"}) 

I get all the list concatenated

                                pred                          act
upc date                                                          
1   jan   [1, 1, 1, 2, 2, 2, 3, 3, 3]  [4, 4, 4, 5, 5, 5, 6, 6, 6]

I want the sum of the list element-wise something like this

   upc date       pred           act
0    1  jan  [6, 6, 6]  [15, 15, 15]

Upvotes: 1

Views: 203

Answers (2)

jezrael
jezrael

Reputation: 863166

Use lambda function with convert values to numy array and sum per axis=0:

f = lambda x: np.array(x.tolist()).sum(axis=0).tolist()
df = df.groupby(by = ["upc","date"], as_index=False).agg({"pred":f,"act":f}) 
print (df)
   upc date       pred           act
0    1  jan  [6, 6, 6]  [15, 15, 15]

Solution with function:

def f(x):
    return np.array(x.tolist()).sum(axis=0).tolist()

df = df.groupby(by = ["upc","date"], as_index=False).agg({"pred":f,"act":f}) 

Upvotes: 2

U13-Forward
U13-Forward

Reputation: 71610

Try this with:

>>> df.groupby(['upc', 'date'], as_index=False).agg({"pred": lambda x: pd.DataFrame(x.values.tolist()).sum().tolist(), "act": lambda x: pd.DataFrame(x.values.tolist()).sum().tolist()})
   upc date       pred           act
0    1  jan  [6, 6, 6]  [15, 15, 15]
>>> 

Upvotes: 1

Related Questions