Create new column in a DataFrame using values from a different row

Question

I am trying to create a new column in a pandas dataframe that is the score for the the same id in the next year. See the sample original data below:

Year  ID    Score
2018  785   8.4 
2018  770   -1.2
2017  733   3.2
2017  785   7.9
2018  733   3.9

If there is not data for the next year it should fill with an NA. So the output I'm looking for would be:

Year  ID    Score col
2018  785   8.4   NA
2018  770   -1.2  NA
2017  733   3.2   3.9
2017  785   7.9   8.4
2018  733   3.9   NA

The data is not currently ordered.

Quang Hoang · Accepted Answer

If your data has consecutive years for all ID's (no ID with 2016 2018), then you can do:

 df['col'] = df.sort_values('Year').groupby('ID').Score.shift(-1)

Output:

   Year   ID  Score  col
0  2018  785    8.4  NaN
1  2018  770   -1.2  NaN
2  2017  733    3.2  3.9
3  2017  785    7.9  8.4
4  2018  733    3.9  NaN

If years are not guaranteed to be consecutive, then do a merge:

df.merge(df.assign(Year=lambda x: x.Year - 1),
         on=['Year', 'ID'],
         suffixes = ['','_new'],
         how='left')

Output:

   Year   ID  Score  Score_new
0  2018  785    8.4        NaN
1  2018  770   -1.2        NaN
2  2017  733    3.2        3.9
3  2017  785    7.9        8.4
4  2018  733    3.9        NaN

Create new column in a DataFrame using values from a different row

Answers (1)

Related Questions