Reputation: 11

Populate the current row based on the prev

A question was posted on the link below where one wanted to use a previous row to populate the current row:

Is there a way in Pandas to use previous row value in dataframe.apply when previous value is also calculated in the apply?

In this case, there was only one index, the date.

Now I want to add a second index, employee ID; on the first occurrence of the first index, Index_EmpID, then I would like B to be populated with a value from A. On any subsequent occurrence, I would like the value from the previous row multiplied by the value from the current row.

I have the following data frame:

|Index_EmpID |Index_Date |   A  | B   |
|============|===========|======|=====|
|A123        |2022-01-31 |   1  | NaN |
|A123        |2022-02-28 |   1  | NaN |
|A123        |2022-03-31 | 1.05 | NaN |
|A123        |2022-04-30 |   1  | NaN |
|A567        |2022-01-31 |   1  | NaN |
|A567        |2022-02-28 | 1.05 | NaN |
|A567        |2022-03-31 |   1  | NaN |
|A567        |2022-04-30 | 1.05 | NaN |

I require:

|Index_EmpID |Index_Date |   A  |  B   |
|============|===========|======|======|
|A123        |2022-01-31 |   1  |  1   |
|A123        |2022-02-28 |   1  |  1   |
|A123        |2022-03-31 | 1.05 | 1.05 |
|A123        |2022-04-30 |   1  | 1.05 |
|A567        |2022-01-31 |   1  |  1   |
|A567        |2022-02-28 | 1.05 | 1.05 |
|A567        |2022-03-31 |   1  | 1.05 |
|A567        |2022-04-30 | 1.05 |1.1025|

Upvotes: 1

Answers (2)

Dima Chubarov

Reputation: 17169

A solution that uses iterrows is not as nice a solution as the one that uses groupby but it follows directly from the description and uses only the most elementary Pandas facilities.

empdf = pd.DataFrame({'Index_EmpID': (['A123']*4 + ['A567']*4),
                      'Index_Date': (['2022-01-31', '2022-02-28',
                                      '2022-03-31', '2022-04-30'] * 2),
                      'A': [1, 1, 1.05, 1, 1, 1.05, 1, 1.05], 
                      'B': ([np.nan]*8)})

past_id, past_b, bs = None, 1, []
for label, row in empdf.iterrows():
     if row['Index_EmpID'] == past_id:
        bs.append(past_b * row['A'])
     else:
        bs.append(row['A'])
     past_b = bs[-1]
     past_id = row['Index_EmpID']

empdf['B'] = bs

This would produce exactly the dataframe you requested

  Index_EmpID  Index_Date     A       B
0        A123  2022-01-31  1.00  1.0000
1        A123  2022-02-28  1.00  1.0000
2        A123  2022-03-31  1.05  1.0500
3        A123  2022-04-30  1.00  1.0500
4        A567  2022-01-31  1.00  1.0000
5        A567  2022-02-28  1.05  1.0500
6        A567  2022-03-31  1.00  1.0500
7        A567  2022-04-30  1.05  1.1025

Upvotes: 0

ags29

Reputation: 2696

Something like

import numpy as np
df.groupby("Index_EmpID")["A"].agg(np.cumprod).reset_index()

should work.

Upvotes: 2

Populate the current row based on the prev

Answers (2)

Related Questions