How to create column based on conditions on other rows - Pandas Dataframe?

Question

I have the following problem :

A dataframe named df1 like this :

    Id      PVF PM_year Year
0   A6489   75  25      2018
1   A175    56  54      2018
2   A2856   34  65      2018
3   A6489   35  150     2019
4   A175    45  700     2019
5   A2856   55  120     2019
6   A6489   205 100     2020
7   A2856   35  445     2020

I want to create a new column named PM_previous_year which is equal for each combination (ID+Year) to the value of PM_year of the same Id and the previous year...

Example :

For the line indexed 3, the Id is 'A6489' and the year is 2019. So the value of the new column "PM_previous_year" should be the value of the line where Id is the same ('A6489') and year is equal to 2018 (2019-1). in this simple example it corresponds to the line indexed 0 and the value expected for the new column on line indexed 3 is then 25.

Finally, the targeted DataFrame df2 for this short example looks like this :

    Id     PVF  PM_year Year    PM_previous_year
0   A6489   75  25      2018    NaN
1   A175    56  54      2018    NaN
2   A2856   34  65      2018    NaN
3   A6489   35  150     2019    25.0
4   A175    45  700     2019    54.0
5   A2856   55  120     2019    65.0
6   A6489   205 100     2020    150.0
7   A2856   35  445     2020    120.0

I havn't found any obvious solution yet. Maybe there is a way in reshaping the df, but I'm not very familiar with that. If somebody have any idea, I would be very grateful. Thks

jezrael · Accepted Answer

If possible simplify solution and shifting PM_year per Id use:

df['PM_previous_year'] = df.groupby('Id')['PM_year'].shift()
print (df)
      Id  PVF  PM_year  Year  PM_previous_year
0  A6489   75       25  2018               NaN
1   A175   56       54  2018               NaN
2  A2856   34       65  2018               NaN
3  A6489   35      150  2019              25.0
4   A175   45      700  2019              54.0
5  A2856   55      120  2019              65.0
6  A6489  205      100  2020             150.0
7  A2856   35      445  2020             120.0

Or:

s = df.pivot('Year','Id','PM_year').shift().unstack().rename('PM_previous_year')

df = df.join(s, on=['Id','Year'])
print (df)
      Id  PVF  PM_year  Year  PM_previous_year
0  A6489   75       25  2018               NaN
1   A175   56       54  2018               NaN
2  A2856   34       65  2018               NaN
3  A6489   35      150  2019              25.0
4   A175   45      700  2019              54.0
5  A2856   55      120  2019              65.0
6  A6489  205      100  2020             150.0
7  A2856   35      445  2020             120.0

How to create column based on conditions on other rows - Pandas Dataframe?

Answers (1)

Related Questions