Reputation: 1228
I have a time series data with symbol and their respective values at a particular time.
index,symbol,value
01:00,A,10
01:00,B,15
01:01,A,15
01:01,B,25
01:02,A,30
01:02,B,45
Now I want create a 4th column, which has cumulative value on time series basis for each symbol but from each cumulative row, the first row value would be subtracted for each symbol respectively
index,symbol,value,adjustedCumulativeSum
01:00,A,10,0
01:00,B,15,0
01:01,A,15,15
01:01,B,25,25
01:02,A,30,45
01:02,B,45,70
I know how to do normal cumulative sum
df = df.reset_index().sort_values(['index','symbol'])
df['cumlativesum'] = df.groupby('symbol')['value'].cumsum()
df = df.set_index('index')
But do I deduct row 0 value from all cumulative sums?
Upvotes: 4
Views: 7630
Reputation: 215137
You can subtract the first value (extracted with .iat[0]
) for each group in a transform
function:
df['cumlativesum'] = df.groupby('symbol')['value'].transform(lambda g: g.cumsum()-g.iat[0])
df = df.set_index('index')
df
# symbol value cumlativesum
#index
#01:00 A 10 0
#01:00 B 15 0
#01:01 A 15 15
#01:01 B 25 25
#01:02 A 30 45
#01:02 B 45 70
Upvotes: 2
Reputation: 863741
Use groupby
with custom function with cumsum
and substract first value selected by iat
:
df['adjustedCumulativeSum']=df.groupby('symbol')['value'].apply(lambda x:x.cumsum()-x.iat[0])
print (df)
index symbol value adjustedCumulativeSum
0 01:00 A 10 0
1 01:00 B 15 0
2 01:01 A 15 15
3 01:01 B 25 25
4 01:02 A 30 45
5 01:02 B 45 70
Upvotes: 3
Reputation: 323396
df.groupby('sy').val.apply(lambda x : x.cumsum()-x.values.tolist()[0])
Out[907]:
0 0
1 0
2 15
3 25
4 45
5 70
Name: val, dtype: int64
Upvotes: 1