Reputation: 31
I have a dataframe that represents time series probabilities. Each value in column 'Single' represents the probability of that event in that time period (where each row represents one time period). Each value in column 'Cumulative' represents the probability of that event occurring every time period until that point (ie it is the product of every value in 'Single' from time 0 until now).
A simplified version of the dataframe looks like this:
Single Cumulative
0 0.990000 1.000000
1 0.980000 0.990000
2 0.970000 0.970200
3 0.960000 0.941094
4 0.950000 0.903450
5 0.940000 0.858278
6 0.930000 0.806781
7 0.920000 0.750306
8 0.910000 0.690282
9 0.900000 0.628157
10 0.890000 0.565341
In order to calculate the 'Cumulative' column based on the 'Single' column I am looping through the dataframe like this:
for index, row in df.iterrows():
df['Cumulative'][index] = df['Single'][:index].prod()
In reality, there is a lot of data and looping is a drag on performance, is it at all possible to achieve this without looping?
I've tried to find a way to vectorize this calculation or even use the pandas.DataFrame.apply function, but I don't believe I'm able to reference the current index value in either of those methods.
Upvotes: -1
Views: 205