kevfefe
kevfefe

Reputation: 31

Calculate a value in Pandas that is based on a product of past values without looping

I have a dataframe that represents time series probabilities. Each value in column 'Single' represents the probability of that event in that time period (where each row represents one time period). Each value in column 'Cumulative' represents the probability of that event occurring every time period until that point (ie it is the product of every value in 'Single' from time 0 until now).

A simplified version of the dataframe looks like this:

      Single  Cumulative
0   0.990000    1.000000
1   0.980000    0.990000
2   0.970000    0.970200
3   0.960000    0.941094
4   0.950000    0.903450
5   0.940000    0.858278
6   0.930000    0.806781
7   0.920000    0.750306
8   0.910000    0.690282
9   0.900000    0.628157
10  0.890000    0.565341

In order to calculate the 'Cumulative' column based on the 'Single' column I am looping through the dataframe like this:

for index, row in df.iterrows():
    df['Cumulative'][index] = df['Single'][:index].prod()

In reality, there is a lot of data and looping is a drag on performance, is it at all possible to achieve this without looping?

I've tried to find a way to vectorize this calculation or even use the pandas.DataFrame.apply function, but I don't believe I'm able to reference the current index value in either of those methods.

Upvotes: -1

Views: 205

Answers (1)

Toby Petty
Toby Petty

Reputation: 4660

There's a built in function for this in Pandas:

df.cumprod()

Upvotes: 3

Related Questions