DsCpp
DsCpp

Reputation: 2489

Use the previous calculated row in pandas in the apply method

Can I use the previous calculated answer from apply(axis=1) within the current row evaluation?

I have this df:

df = pd.DataFrame(np.random.randn(5,3),columns=list('ABC'))
df

    A           B            C         String_column
0   0.297925    -1.025012   1.307090   'a'
1   -1.527406   0.533451    -0.650252  'b'
2   -1.646425   0.738068    0.562747   'c'
3   -0.045872   0.088864    0.932650   'd'
4   -0.964226   0.542817    0.873731   'e'

and I'm trying to add for each row the value of the previous row multiplied by 2 and added to the current value, without manipulating the string column (e.g row = row + row(shift-1) *0.5). This is the code I have so far:

def calc_by_previous_answer(row):
    #here i have only the current row so I'm unable to get the previous one
    row = row * 0.5
    return row

#add the shift here will not propagate the previous answer
df = df.apply(calc_by_previous_answer, axis=1)
df

Upvotes: 0

Views: 59

Answers (2)

jezrael
jezrael

Reputation: 862751

Not easy, but possible with select by previous values by loc, for select only numeric columns use DataFrame.select_dtypes:

def calc_by_previous_answer(row):
    #here i have only the current row so I'm unable to get the previous one     
    #cannot select previous row of first row because not exist
    if row.name > 0:
        row = df.loc[row.name-1, c] * 0.5 + row
#    else:
#        row = row * 0.5
    return row

c =  df.select_dtypes(np.number).columns
df[c] = df[c].apply(calc_by_previous_answer, axis=1)
print (df)
          A         B         C String_column
0  0.297925 -1.025012  1.307090           'a'
1 -1.378443  0.020945  0.003293           'b'
2 -2.410128  1.004794  0.237621           'c'
3 -0.869085  0.457898  1.214023           'd'
4 -0.987162  0.587249  1.340056           'e'

Solution with no apply with DataFrame.add:

c = df.select_dtypes(np.number).columns
df[c] = df[c].add(df[c].shift() * 0.5, fill_value=0)
print (df)

          A         B         C String_column
0  0.297925 -1.025012  1.307090           'a'
1 -1.378443  0.020945  0.003293           'b'
2 -2.410128  1.004794  0.237621           'c'
3 -0.869085  0.457898  1.214023           'd'
4 -0.987162  0.587249  1.340056           'e'

EDIT:

c = df.select_dtypes(np.number).columns
for idx, row in df.iterrows():
    if row.name > 0:
        df.loc[idx, c] = df.loc[idx-1, c] * 0.5 + df.loc[idx, c]

print (df)
          A         B         C String_column
0  0.297925 -1.025012  1.307090           'a'
1 -1.378443  0.020945  0.003293           'b'
2 -2.335647  0.748541  0.564393           'c'
3 -1.213695  0.463134  1.214847           'd'
4 -1.571074  0.774384  1.481154           'e'

Upvotes: 1

Shaido
Shaido

Reputation: 28332

There is no need to use apply, you can solve it as follows. Since you want to use the updated row value in the calculation of the following row value, you need to use a for loop.

cols = ['A','B','C']
for i in range(1, len(df)):
    df.loc[i, cols] = df.loc[i-1, cols] * 0.5 + df.loc[i, cols]

Result:

            A           B          C String_column
0    0.297925   -1.025012   1.307090           'a'
1   -1.378443    0.020945   0.003293           'b'
2   -2.335647    0.748541   0.564393           'c'
3   -1.213695    0.463134   1.214847           'd'
4   -1.571074    0.774384   1.481154           'e'

Upvotes: 0

Related Questions