Anthony W
Anthony W

Reputation: 1327

Pandas: cumulative functions application

Consider the simple dataframe example using pandas:

df = pd.DataFrame({'x' : [10, 20, 30, 40]}, index = ['0','1','2', '3'])

This gives the following:

index x
0     10
1     20
2     30
3     40

I'm trying to take values of x and for each row produce a result (via a lambda) that also utilises the previous row calculation. That is, I'd like to calculate y[i+1] as a function of x[i+1] and y[i]. So for example:

y[i+1] = sin(x[i+1]) + (15 * y[i])

So this would give the following DataFrame:

index x  y
0     10 -0.54
1     20 -7.2
2     30 -109.7
3     40 -1644.7

For the first row, this is presumably a special case (as there is no y[-1])? So I'd like to give this a specific number.

I have been trying to solve this with expanding_apply, but with no joy. Thanks.

UPDATE

So I answered my question, in a way I understand, with the help below (thank you):

df.loc[:,'y'] = 0
initial_y_val = 10

for i in range (0, df.shape[0]):
    if i == 0 : df.iloc[0,1] = initial_y_val + df.iloc[0,0] 
    else      : df.iloc[i,1] = df.iloc[i,0] + df.iloc[(i-1),1] 

print df

This gives:

    x    y
0  10   20
1  20   40
2  30   70
3  40  110

So my question is, is there a more idiomatic (and faster) way of achieving the same outcome?

Upvotes: 3

Views: 946

Answers (1)

Colonel Beauvel
Colonel Beauvel

Reputation: 31171

There is the cumsum from pandas which solves your problem:

df['y'] = df.x.cumsum()

In [171]: df
Out[171]:
    x    y
0  10   10
1  20   30
2  30   60
3  40  100

Edit:

Very nice question indeed, you can see by developping y1, y2, ...,yn that it's a growing polynomial of sin(x) with coefficient which are power of 15. I would opt for this solution by iterating over the DataFrame index:

z = df.x.map(math.sin)

df['y']=[sum(z[:i]*15**np.arange(int(i)+1)[::-1]) for i,r in df.iterrows()]

In [258]: df
Out[258]:
    x            y
0  10    -0.544021
1  20    -7.247371
2  30  -109.698603
3  40 -1644.733929

Upvotes: 1

Related Questions