Reputation: 327
I have a following requirement with a sample csv file as below,
colA colB NewCol
1 val1 val1
1 val2 val1 + val2
1 val3 val1 + val2 + val3
2 val4 val4
2 val5 val4 + val5
2 val6 val4 + val5 + val6
3 val7 val7
3 val8 val7 + val8
3 val9 val7 + val8 + val9
I want to add a new column to df, such that it add successive colB values only for same ID value. Col B can be Integer or String.
Upvotes: 2
Views: 682
Reputation: 863631
Use np.cumsum
or pd.Series.cumsum
what perfectly working with numeric and strings:
df['NewCol'] = df.groupby('colA')['colB'].apply(np.cumsum)
Or:
df['NewCol'] = df.groupby('colA')['colB'].apply(pd.Series.cumsum)
print (df)
colA colB NewCol
0 1 val1 val1
1 1 val2 val1val2
2 1 val3 val1val2val3
3 2 val4 val4
4 2 val5 val4val5
5 2 val6 val4val5val6
6 3 val7 val7
7 3 val8 val7val8
8 3 val9 val7val8val9
Upvotes: 1
Reputation: 164843
Using groupby
+ cumsum
:
df['NewCol'] = df.groupby('colA')['colB'].cumsum()
For summing strings, you can use a list comprehension with groupby
:
df = df.sort_values('colA')
df['C'] = [''.join(g.iloc[:i+1].values) \
for _, g in df.groupby('colA')['colB'] \
for i in range(len(g.index))]
print(df)
colA colB C
0 1 val1 val1
1 1 val2 val1val2
2 1 val3 val1val2val3
3 2 val4 val4
4 2 val5 val4val5
5 2 val6 val4val5val6
6 3 val7 val7
7 3 val8 val7val8
8 3 val9 val7val8val9
Explanation
groupby
elements via for _, g in df.groupby(...)[...]
. We ignore the index _
here as we are only interested in the values.range(len(g.index))
, i.e. iterate over the number of items in the group.''.join
on the first i
values. We use g.iloc[:i+1]
to extract the first i
values in a group.Upvotes: 2