Reputation: 79
I am trying to create a new column in a pandas dataframe that is the score for the the same id in the next year. See the sample original data below:
Year ID Score
2018 785 8.4
2018 770 -1.2
2017 733 3.2
2017 785 7.9
2018 733 3.9
If there is not data for the next year it should fill with an NA. So the output I'm looking for would be:
Year ID Score col
2018 785 8.4 NA
2018 770 -1.2 NA
2017 733 3.2 3.9
2017 785 7.9 8.4
2018 733 3.9 NA
The data is not currently ordered.
Upvotes: 1
Views: 41
Reputation: 150825
If your data has consecutive years for all ID's (no ID with 2016 2018
), then you can do:
df['col'] = df.sort_values('Year').groupby('ID').Score.shift(-1)
Output:
Year ID Score col
0 2018 785 8.4 NaN
1 2018 770 -1.2 NaN
2 2017 733 3.2 3.9
3 2017 785 7.9 8.4
4 2018 733 3.9 NaN
If years are not guaranteed to be consecutive, then do a merge:
df.merge(df.assign(Year=lambda x: x.Year - 1),
on=['Year', 'ID'],
suffixes = ['','_new'],
how='left')
Output:
Year ID Score Score_new
0 2018 785 8.4 NaN
1 2018 770 -1.2 NaN
2 2017 733 3.2 3.9
3 2017 785 7.9 8.4
4 2018 733 3.9 NaN
Upvotes: 3