Reputation: 2489
Can I use the previous calculated answer from apply(axis=1)
within the current row evaluation?
I have this df:
df = pd.DataFrame(np.random.randn(5,3),columns=list('ABC'))
df
A B C String_column
0 0.297925 -1.025012 1.307090 'a'
1 -1.527406 0.533451 -0.650252 'b'
2 -1.646425 0.738068 0.562747 'c'
3 -0.045872 0.088864 0.932650 'd'
4 -0.964226 0.542817 0.873731 'e'
and I'm trying to add for each row the value of the previous row multiplied by 2 and added to the current value, without manipulating the string column (e.g row = row + row(shift-1) *0.5
).
This is the code I have so far:
def calc_by_previous_answer(row):
#here i have only the current row so I'm unable to get the previous one
row = row * 0.5
return row
#add the shift here will not propagate the previous answer
df = df.apply(calc_by_previous_answer, axis=1)
df
Upvotes: 0
Views: 59
Reputation: 862751
Not easy, but possible with select by previous values by loc
, for select only numeric columns use DataFrame.select_dtypes
:
def calc_by_previous_answer(row):
#here i have only the current row so I'm unable to get the previous one
#cannot select previous row of first row because not exist
if row.name > 0:
row = df.loc[row.name-1, c] * 0.5 + row
# else:
# row = row * 0.5
return row
c = df.select_dtypes(np.number).columns
df[c] = df[c].apply(calc_by_previous_answer, axis=1)
print (df)
A B C String_column
0 0.297925 -1.025012 1.307090 'a'
1 -1.378443 0.020945 0.003293 'b'
2 -2.410128 1.004794 0.237621 'c'
3 -0.869085 0.457898 1.214023 'd'
4 -0.987162 0.587249 1.340056 'e'
Solution with no apply
with DataFrame.add
:
c = df.select_dtypes(np.number).columns
df[c] = df[c].add(df[c].shift() * 0.5, fill_value=0)
print (df)
A B C String_column
0 0.297925 -1.025012 1.307090 'a'
1 -1.378443 0.020945 0.003293 'b'
2 -2.410128 1.004794 0.237621 'c'
3 -0.869085 0.457898 1.214023 'd'
4 -0.987162 0.587249 1.340056 'e'
EDIT:
c = df.select_dtypes(np.number).columns
for idx, row in df.iterrows():
if row.name > 0:
df.loc[idx, c] = df.loc[idx-1, c] * 0.5 + df.loc[idx, c]
print (df)
A B C String_column
0 0.297925 -1.025012 1.307090 'a'
1 -1.378443 0.020945 0.003293 'b'
2 -2.335647 0.748541 0.564393 'c'
3 -1.213695 0.463134 1.214847 'd'
4 -1.571074 0.774384 1.481154 'e'
Upvotes: 1
Reputation: 28332
There is no need to use apply
, you can solve it as follows. Since you want to use the updated row value in the calculation of the following row value, you need to use a for loop.
cols = ['A','B','C']
for i in range(1, len(df)):
df.loc[i, cols] = df.loc[i-1, cols] * 0.5 + df.loc[i, cols]
Result:
A B C String_column
0 0.297925 -1.025012 1.307090 'a'
1 -1.378443 0.020945 0.003293 'b'
2 -2.335647 0.748541 0.564393 'c'
3 -1.213695 0.463134 1.214847 'd'
4 -1.571074 0.774384 1.481154 'e'
Upvotes: 0