lalatnayak
lalatnayak

Reputation: 170

Pandas series sort by month index

Dec    47
Nov    36
Oct    14
Sep     2
Jan     2
Aug     2
May     1
Apr     1
Jun     1
Jul     1
Feb     1
Name: date, dtype: int64

I'm tring to sort the above series whose index column is month, by month. However instead of sorting by month's calendar order the sort function is sorting by dictionary order of the month name. How can I sort the above correctly? Guess I have to specify that the index type is month and not string. Any help is appreciated. Code snippet below.

import calendar
movies = release_dates[release_dates.title.str.contains('Christmas') & (release_dates.country=='USA')]
movies = movies.date.dt.month.apply(lambda x: calendar.month_abbr[x])
counts = movies.value_counts()
counts

Upvotes: 4

Views: 8415

Answers (3)

jezrael
jezrael

Reputation: 862481

You can use sorted CategoricalIndex with sort_index:

cats = ['Jan', 'Feb', 'Mar', 'Apr','May','Jun', 'Jul', 'Aug','Sep', 'Oct', 'Nov', 'Dec']
df.index = pd.CategoricalIndex(df.index, categories=cats, ordered=True)
df = df.sort_index()

print (df)
     date
Jan     2
Feb     1
Apr     1
May     1
Jun     1
Jul     1
Aug     2
Sep     2
Oct    14
Nov    36
Dec    47

Or use DataFrame.reindex - but if some value is missing add NaNs rows:

df = df.reindex(cats)

Upvotes: 7

Michael Nishimura
Michael Nishimura

Reputation: 21

Adding to the very helpful answer by @jezrael:

In pandas 0.25.1 sorted has been replaced by ordered per pandas.CategoricalIndex

Old way:

df.index = pd.CategoricalIndex(df.index, 
                               categories=['Jan', 'Feb', 'Mar', 'Apr','May','Jun', 'Jul', 'Aug','Sep', 'Oct', 'Nov', 'Dec'], 
                               sorted=True)
df = df.sort_index()

Error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-468-3f0ab66734d4> in <module>
      2 net.index = pd.CategoricalIndex(net.index, 
      3                                categories=['Jan', 'Feb', 'Mar', 'Apr','May','Jun', 'Jul', 'Aug','Sep', 'Oct', 'Nov', 'Dec'],
----> 4                                sorted=True)
      5 net = net.sort_index()
      6 net

TypeError: __new__() got an unexpected keyword argument 'sorted'

New way:

df.index = pd.CategoricalIndex(df.index, 
                               categories=['Jan', 'Feb', 'Mar', 'Apr','May','Jun', 'Jul', 'Aug','Sep', 'Oct', 'Nov', 'Dec'], 
                               ordered=True)
df = df.sort_index()

Upvotes: 2

lalatnayak
lalatnayak

Reputation: 170

Okay it was not very complex. I'm sure Categorical would have worked just that I was unable to solve the problem using Categorical. What I did was-

  1. Sort by month while months were being represented as integers
  2. To the resulting series applied a mapper on the index to convert the integer month into an abbreviated string

I'm sure there are more efficient ways of solving this, so if you have a better way please post the same.

    import calendar
    months = release_dates[release_dates.title.str.contains('Christmas') & (release_dates.country=='USA')].date.dt.month
    counts = months.value_counts()
    counts.sort_index(inplace=True)
    counts.index = map(lambda x: calendar.month_abbr[x], counts.index)
    counts.plot.bar()

Upvotes: 1

Related Questions