Reputation: 137
(python) I currently have a pandas dataframe that looks something like this:
player | year | points |
-----------------------------------------------
LeSean McCoy | 2012 | 199.3 |
-----------------------------------------------
LeSean McCoy | 2013 | 332.6 |
-----------------------------------------------
LeSean McCoy | 2014 | 200.4 |
-----------------------------------------------
I'm trying to add a new column to the dataframe that holds
a player's previous year points
.
I can do a groupby
that transforms the dataframe into one row in this example, with
each year
being its own column. However, I only want one added column, for example:
player | year | points | prev_year_pts |
-----------------------------------------------------------------------
LeSean McCoy | 2012 | 199.3 | 0 |
-----------------------------------------------------------------------
LeSean McCoy | 2013 | 332.6 | 199.3 |
-----------------------------------------------------------------------
LeSean McCoy | 2014 | 200.4 | 332.6 |
-----------------------------------------------------------------------
The true dataframe I'm working with has more than 300 unique player names, so I've been trying to get a solution on this example that would be able to also work with a different player name in the sample, with a desired output like this:
player | year | points | prev_year_pts |
------------------------------------------------------------------------------
LeSean McCoy | 2012 | 199.3 | 0 |
------------------------------------------------------------------------------
LeSean McCoy | 2013 | 332.6 | 199.3 |
------------------------------------------------------------------------------
LeSean McCoy | 2014 | 200.4 | 332.6 |
------------------------------------------------------------------------------
Christian McCaffrey | 2017 | 228.6 | 0 |
------------------------------------------------------------------------------
Christian McCaffrey | 2018 | 385.5 | 228.6 |
------------------------------------------------------------------------------
Christian McCaffrey | 2019 | 471.2 | 385.5 |
------------------------------------------------------------------------------
I've been able to add a prev_year
column with the following code:
example["prev_year"] = [x-1 for x in example.groupby(["player"])["year"].get_group("LeSean McCoy")]
But I'm stuck on how to get the prev_year_points
from that, and how to implement in a way
that could calculate that for each player
observation ...
Upvotes: 2
Views: 766
Reputation: 6483
You can try to sort the values by player
and year
at first, and then do groupby
+ shift
:
df=df.sort_values(['player','year'])
df['prev_year_pts']=df.groupby('player')['points'].shift(fill_value=0)
So a little example with the sample you give:
#create the dataframe
d={'player': {0: 'LeSean McCoy', 1: 'LeSean McCoy', 2: 'LeSean McCoy', 3: 'Christian McCaffrey', 4: 'Christian McCaffrey', 5: 'Christian McCaffrey'},
'year': {0: 2013, 1: 2012, 2: 2014, 3: 2019, 4: 2018, 5: 2017}, 'points': {0: 199.3, 1: 332.6, 2: 200.4, 3: 228.6, 4: 385.5, 5: 471.2}}
df=pd.DataFrame(d)
df
# player year points
#0 LeSean McCoy 2013 199.3
#1 LeSean McCoy 2012 332.6
#2 LeSean McCoy 2014 200.4
#3 Christian McCaffrey 2019 228.6
#4 Christian McCaffrey 2018 385.5
#5 Christian McCaffrey 2017 471.2
df=df.sort_values(['player','year'])
df
# player year points
#5 Christian McCaffrey 2017 471.2
#4 Christian McCaffrey 2018 385.5
#3 Christian McCaffrey 2019 228.6
#1 LeSean McCoy 2012 332.6
#0 LeSean McCoy 2013 199.3
#2 LeSean McCoy 2014 200.4
df['prev_year_pts']=df.groupby('player')['points'].shift(fill_value=0)
df
# player year points prev_year_pts
#5 Christian McCaffrey 2017 471.2 0.0
#4 Christian McCaffrey 2018 385.5 471.2
#3 Christian McCaffrey 2019 228.6 385.5
#1 LeSean McCoy 2012 332.6 0.0
#0 LeSean McCoy 2013 199.3 332.6
#2 LeSean McCoy 2014 200.4 199.3
Upvotes: 2