Reputation: 170
Dec 47
Nov 36
Oct 14
Sep 2
Jan 2
Aug 2
May 1
Apr 1
Jun 1
Jul 1
Feb 1
Name: date, dtype: int64
I'm tring to sort the above series whose index column is month, by month. However instead of sorting by month's calendar order the sort function is sorting by dictionary order of the month name. How can I sort the above correctly? Guess I have to specify that the index type is month and not string. Any help is appreciated. Code snippet below.
import calendar
movies = release_dates[release_dates.title.str.contains('Christmas') & (release_dates.country=='USA')]
movies = movies.date.dt.month.apply(lambda x: calendar.month_abbr[x])
counts = movies.value_counts()
counts
Upvotes: 4
Views: 8415
Reputation: 862481
You can use sorted CategoricalIndex
with sort_index
:
cats = ['Jan', 'Feb', 'Mar', 'Apr','May','Jun', 'Jul', 'Aug','Sep', 'Oct', 'Nov', 'Dec']
df.index = pd.CategoricalIndex(df.index, categories=cats, ordered=True)
df = df.sort_index()
print (df)
date
Jan 2
Feb 1
Apr 1
May 1
Jun 1
Jul 1
Aug 2
Sep 2
Oct 14
Nov 36
Dec 47
Or use DataFrame.reindex
- but if some value is missing add NaNs rows:
df = df.reindex(cats)
Upvotes: 7
Reputation: 21
Adding to the very helpful answer by @jezrael:
In pandas 0.25.1 sorted
has been replaced by ordered
per pandas.CategoricalIndex
Old way:
df.index = pd.CategoricalIndex(df.index,
categories=['Jan', 'Feb', 'Mar', 'Apr','May','Jun', 'Jul', 'Aug','Sep', 'Oct', 'Nov', 'Dec'],
sorted=True)
df = df.sort_index()
Error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-468-3f0ab66734d4> in <module>
2 net.index = pd.CategoricalIndex(net.index,
3 categories=['Jan', 'Feb', 'Mar', 'Apr','May','Jun', 'Jul', 'Aug','Sep', 'Oct', 'Nov', 'Dec'],
----> 4 sorted=True)
5 net = net.sort_index()
6 net
TypeError: __new__() got an unexpected keyword argument 'sorted'
New way:
df.index = pd.CategoricalIndex(df.index,
categories=['Jan', 'Feb', 'Mar', 'Apr','May','Jun', 'Jul', 'Aug','Sep', 'Oct', 'Nov', 'Dec'],
ordered=True)
df = df.sort_index()
Upvotes: 2
Reputation: 170
Okay it was not very complex. I'm sure Categorical would have worked just that I was unable to solve the problem using Categorical. What I did was-
I'm sure there are more efficient ways of solving this, so if you have a better way please post the same.
import calendar
months = release_dates[release_dates.title.str.contains('Christmas') & (release_dates.country=='USA')].date.dt.month
counts = months.value_counts()
counts.sort_index(inplace=True)
counts.index = map(lambda x: calendar.month_abbr[x], counts.index)
counts.plot.bar()
Upvotes: 1