Tahseen
Tahseen

Reputation: 1228

Pandas Cumulative Sum in GroupBy

I have a time series data with symbol and their respective values at a particular time.

index,symbol,value
01:00,A,10
01:00,B,15
01:01,A,15
01:01,B,25
01:02,A,30
01:02,B,45

Now I want create a 4th column, which has cumulative value on time series basis for each symbol but from each cumulative row, the first row value would be subtracted for each symbol respectively

index,symbol,value,adjustedCumulativeSum
01:00,A,10,0
01:00,B,15,0
01:01,A,15,15
01:01,B,25,25
01:02,A,30,45
01:02,B,45,70

I know how to do normal cumulative sum

df = df.reset_index().sort_values(['index','symbol'])
df['cumlativesum'] = df.groupby('symbol')['value'].cumsum()
df = df.set_index('index')

But do I deduct row 0 value from all cumulative sums?

Upvotes: 4

Views: 7630

Answers (3)

akuiper
akuiper

Reputation: 215137

You can subtract the first value (extracted with .iat[0]) for each group in a transform function:

df['cumlativesum'] = df.groupby('symbol')['value'].transform(lambda g: g.cumsum()-g.iat[0])
df = df.set_index('index')

df
#      symbol   value   cumlativesum
#index          
#01:00      A      10          0
#01:00      B      15          0
#01:01      A      15         15
#01:01      B      25         25
#01:02      A      30         45
#01:02      B      45         70

Upvotes: 2

jezrael
jezrael

Reputation: 863741

Use groupby with custom function with cumsum and substract first value selected by iat:

df['adjustedCumulativeSum']=df.groupby('symbol')['value'].apply(lambda x:x.cumsum()-x.iat[0])
print (df)
   index symbol  value  adjustedCumulativeSum
0  01:00      A     10                      0
1  01:00      B     15                      0
2  01:01      A     15                     15
3  01:01      B     25                     25
4  01:02      A     30                     45
5  01:02      B     45                     70

Upvotes: 3

BENY
BENY

Reputation: 323396

df.groupby('sy').val.apply(lambda x : x.cumsum()-x.values.tolist()[0])
Out[907]: 
0     0
1     0
2    15
3    25
4    45
5    70
Name: val, dtype: int64

Upvotes: 1

Related Questions