dataframe group, sum and concatenate

Question

I have a dataframe dfsorted :

dfsorted = df.sort_values(["sku"], ascending=[True])
print(dfsorted.head())

id	sku	bill	qty_left
186	01-04	50469	0
16	01-20	50262	15
267	01-20	50460	1
18	01-20	50262	5
17	01-20	50262	5

How can I group / aggregate the dfsorted into this desired result:

sku	bill	qty_left
01-04	50469	0
01-20	50262, 50460	26

So :

group the dataframe by 'sku'
for each 'sku', concatenate the 'bill' values (these are already formatted as strings, I don't care if there are duplicates but unique values would be nice too)
for each 'sku', sum the 'qty_left' values.

Thanks!

Paul · Accepted Answer

Use agg, where you can apply both custom (lambda) functions as standard (such as sum) functions:

df.groupby('sku').agg({'bill': lambda x: set(x), 'qty_left':'sum'})

set makes sure they are unique values, using list makes them just concatenated.

result:

        bill            qty_left
sku     
01-04   {50469}         0
01-20   {50460, 50262}  26

If you want a string instead of a set for bill you can use:

df2.bill.apply(lambda s: ', '.join(list(map(str, s))))

Where df2 is the result of the groupby.agg function above.

Answers (2)