Reputation: 3379
I have something like the following DataFrame
where I have data points at 2 locations in 4 seasons in 2 years.
>>> df=pd.DataFrame(index=pd.MultiIndex.from_product([[1,2,3,4],[2011,2012],['A','B']], names=['Season','Year','Location']))
>>> df['Value']=np.random.randint(1,100,len(df))
>>> df
Value
Season Year Location
1 2011 A 40
B 7
2012 A 81
B 84
2 2011 A 37
B 59
2012 A 30
B 6
3 2011 A 71
B 43
2012 A 3
B 65
4 2011 A 45
B 13
2012 A 38
B 70
>>>
I would like to create a new series that represents that number of the season sorted by year. For example, the seasons in the first year would just be 1,2,3,4 and then the seasons in the second year would be 5,6,7,8. The series would look like this:
Season Year Location
1 2011 A 1
B 1
2012 A 5
B 5
2 2011 A 2
B 2
2012 A 6
B 6
3 2011 A 3
B 3
2012 A 7
B 7
4 2011 A 4
B 4
2012 A 8
B 8
Name: SeasonNum, dtype: int64
>>>
Any suggestions on the best way to do this?
Upvotes: 0
Views: 47
Reputation: 42905
You could do:
def seasons(row):
return row['Year'] % 2011 * 4 + row['Season']
df.reset_index(inplace=True)
df['seasons'] = df.apply(seasons, axis=1)
df.set_index(['Season', 'Year', 'Location'], inplace=True)
Upvotes: 1