Reputation: 1327
Consider the simple dataframe
example using pandas
:
df = pd.DataFrame({'x' : [10, 20, 30, 40]}, index = ['0','1','2', '3'])
This gives the following:
index x
0 10
1 20
2 30
3 40
I'm trying to take values of x
and for each row produce a result (via a lambda
) that also utilises the previous row calculation. That is, I'd like to calculate y[i+1]
as a function of x[i+1]
and y[i]
. So for example:
y[i+1] = sin(x[i+1]) + (15 * y[i])
So this would give the following DataFrame
:
index x y
0 10 -0.54
1 20 -7.2
2 30 -109.7
3 40 -1644.7
For the first row, this is presumably a special case (as there is no y[-1]
)? So I'd like to give this a specific number.
I have been trying to solve this with expanding_apply
, but with no joy. Thanks.
UPDATE
So I answered my question, in a way I understand, with the help below (thank you):
df.loc[:,'y'] = 0
initial_y_val = 10
for i in range (0, df.shape[0]):
if i == 0 : df.iloc[0,1] = initial_y_val + df.iloc[0,0]
else : df.iloc[i,1] = df.iloc[i,0] + df.iloc[(i-1),1]
print df
This gives:
x y
0 10 20
1 20 40
2 30 70
3 40 110
So my question is, is there a more idiomatic (and faster) way of achieving the same outcome?
Upvotes: 3
Views: 946
Reputation: 31171
There is the cumsum
from pandas
which solves your problem:
df['y'] = df.x.cumsum()
In [171]: df
Out[171]:
x y
0 10 10
1 20 30
2 30 60
3 40 100
Edit:
Very nice question indeed, you can see by developping y1, y2, ...,yn
that it's a growing polynomial of sin(x)
with coefficient which are power of 15
. I would opt for this solution by iterating over the DataFrame
index:
z = df.x.map(math.sin)
df['y']=[sum(z[:i]*15**np.arange(int(i)+1)[::-1]) for i,r in df.iterrows()]
In [258]: df
Out[258]:
x y
0 10 -0.544021
1 20 -7.247371
2 30 -109.698603
3 40 -1644.733929
Upvotes: 1