mohitos
mohitos

Reputation: 131

Python, pandas, cumulative sum in new column on matching groups

If I have these columns in a dataframe:

a     b  
1     5   
1     7
2     3
1,2   3
2     5

How do I create column c where column b is summed using groupings of column a (string), keeping the existing dataframe. Some rows can belong to more than one group.

a     b     c
1     5     15
1     7     15
2     3     11
1,2   3     26
2     5     11

Is there an easy and efficient solution as the dataframe I have is very large.

Upvotes: 2

Views: 694

Answers (1)

jezrael
jezrael

Reputation: 863741

You can first need split column a and join it to original DataFrame:

print (df.a.str.split(',', expand=True)
               .stack()
               .reset_index(level=1, drop=True)
               .rename('a'))
0    1
1    1
2    2
3    1
3    2
4    2
Name: a, dtype: object

df1 = df.drop('a', axis=1)
        .join(df.a.str.split(',', expand=True)
                      .stack()
                      .reset_index(level=1, drop=True)
                      .rename('a'))
print (df1)
   b  a
0  5  1
1  7  1
2  3  2
3  3  1
3  3  2
4  5  2

Then use transform for sum without aggragation.

df1['c'] = df1.groupby(['a'])['b'].transform(sum)
#cast for aggreagation join working with strings
df1['a'] = df1.a.astype(str)
print (df1)
   b  a   c
0  5  1  15
1  7  1  15
2  3  2  11
3  3  1  15
3  3  2  11
4  5  2  11

Last groupby by index and aggregate columns by agg:

print (df1.groupby(level=0)
          .agg({'a':','.join,'b':'first' ,'c':sum})
          [['a','b','c']] )

     a  b   c
0    1  5  15
1    1  7  15
2    2  3  11
3  1,2  3  26
4    2  5  11          

Upvotes: 2

Related Questions