user1692094
user1692094

Reputation: 342

dataframe group, sum and concatenate

I have a dataframe dfsorted :

dfsorted = df.sort_values(["sku"], ascending=[True])
print(dfsorted.head())
id sku bill qty_left
186 01-04 50469 0
16 01-20 50262 15
267 01-20 50460 1
18 01-20 50262 5
17 01-20 50262 5

How can I group / aggregate the dfsorted into this desired result:

sku bill qty_left
01-04 50469 0
01-20 50262, 50460 26

So :

Thanks!

Upvotes: 0

Views: 44

Answers (2)

Paul
Paul

Reputation: 1897

Use agg, where you can apply both custom (lambda) functions as standard (such as sum) functions:

df.groupby('sku').agg({'bill': lambda x: set(x), 'qty_left':'sum'})

set makes sure they are unique values, using list makes them just concatenated.

result:

        bill            qty_left
sku     
01-04   {50469}         0
01-20   {50460, 50262}  26

If you want a string instead of a set for bill you can use:

df2.bill.apply(lambda s: ', '.join(list(map(str, s))))

Where df2 is the result of the groupby.agg function above.

Upvotes: 2

jezrael
jezrael

Reputation: 862661

Use GroupBy.agg with lambda function for remove duplicates in original ordering:

df1 = (df.groupby('sku', as_index=False)
         .agg({'bill': lambda x:','.join(dict.fromkeys(x)), 
               'qty_left':'sum'}))
print (df1)
     sku         bill  qty_left
0  01-04        50469         0
1  01-20  50262,50460        26

If bfill column are strings use:

df1 = (df.astype({'bill':str})
         .groupby('sku', as_index=False)
         .agg({'bill': lambda x:','.join(dict.fromkeys(x)), 
               'qty_left':'sum'}))
print (df1)
     sku         bill  qty_left
0  01-04        50469         0
1  01-20  50262,50460        26

Upvotes: 1

Related Questions