Reputation: 59
I have a multi-indexed dataframe, where the left-most index is NBA Player, and the second level index is NBA Season (i.e. 2018-19). I'd like to add a column that numbers each players season. For example on the head of the dateframe below, I'd like to add a column next to season that lists A.J. Guyton's 2000-01 season as '1' and his 2001-02 season as '2'. Then the process would repeat for the next player throughout the dataframe.
Age Tm OBPM BPM DBPM
Player Season
A.J. Guyton 2000-01 22 CHI -0.57 -2.8 -2.1
2001-02 23 CHI -0.80 -3.4 -2.4
A.J. Price 2009-10 23 IND -0.75 -2.2 -1.1
2010-11 24 IND -1.51 -3.1 -1.0
2011-12 25 IND -0.35 -2.2 -1.4
I'm new to pandas and relatively new to Python altogether, so this is likely a simple question but I'm not sure how to even approach it since every player's start year is different.
Upvotes: 0
Views: 50
Reputation: 21572
You can use the split/apply/combine pattern with groupby and cumcount. The cumcount acts as a transform which returns a series with the same index as the original dataframe in contrast with an aggregation (like mean) which returns one value for each group.
df['career_year'] = df.groupby(level='Player').cumcount()
With your data, this will give
Age Tm OBPM BPM DBPM career_year
Player Season
A.J. Guyton 2000-01 22 CHI -0.57 -2.8 -2.1 0
2001-02 23 CHI -0.80 -3.4 -2.4 1
A.J. Price 2009-10 23 IND -0.75 -2.2 -1.1 0
2010-11 24 IND -1.51 -3.1 -1.0 1
2011-12 25 IND -0.35 -2.2 -1.4 2
Upvotes: 1
Reputation: 1154
you should include code for how to generate your sample data. Makes it easier for others to help you.
dataframe['Season'] = 2
will create a new column 'Season' and populate it with 2.
Upvotes: 0