Reputation: 131
If I have these columns in a dataframe:
a b
1 5
1 7
2 3
1,2 3
2 5
How do I create column c
where column b
is summed using groupings of column a
(string), keeping the existing dataframe. Some rows can belong to more than one group.
a b c
1 5 15
1 7 15
2 3 11
1,2 3 26
2 5 11
Is there an easy and efficient solution as the dataframe I have is very large.
Upvotes: 2
Views: 694
Reputation: 863741
You can first need split column a
and join
it to original DataFrame
:
print (df.a.str.split(',', expand=True)
.stack()
.reset_index(level=1, drop=True)
.rename('a'))
0 1
1 1
2 2
3 1
3 2
4 2
Name: a, dtype: object
df1 = df.drop('a', axis=1)
.join(df.a.str.split(',', expand=True)
.stack()
.reset_index(level=1, drop=True)
.rename('a'))
print (df1)
b a
0 5 1
1 7 1
2 3 2
3 3 1
3 3 2
4 5 2
Then use transform
for sum
without aggragation.
df1['c'] = df1.groupby(['a'])['b'].transform(sum)
#cast for aggreagation join working with strings
df1['a'] = df1.a.astype(str)
print (df1)
b a c
0 5 1 15
1 7 1 15
2 3 2 11
3 3 1 15
3 3 2 11
4 5 2 11
Last groupby
by index and aggregate columns by agg
:
print (df1.groupby(level=0)
.agg({'a':','.join,'b':'first' ,'c':sum})
[['a','b','c']] )
a b c
0 1 5 15
1 1 7 15
2 2 3 11
3 1,2 3 26
4 2 5 11
Upvotes: 2