Neolan
Neolan

Reputation: 1

pd.to_datetime orders the month wrong

I am using pandas to read a .csv file. I want to analize the data of this file by month. Originally, the first 5 rows look like:

  date      value
01.04.2017  208.04
01.04.2017  81
01.04.2017  280
01.04.2017  403.08
01.04.2017  71.1

So I use:

df1['date']=pd.to_datetime(df1['date'], format='%d.%m.%Y')
df1['month']=df1['date'].dt.strftime('%B')

However, when I look at my new month column, I get the following:

print(df1['month'].unique())
>>['April' 'May' 'June' 'July' 'August' 'September' 'January' 'October'
 'November' 'December' 'February' 'March']

January comes after September, although in the original .csv the dates are correctly ordered. Does anyone have an idea to solve this, or where does the problem come from? Thank you in advance!

P.S. I import the file with:

df1=pd.read_csv("GF2017_2018.csv", delimiter=';',dtype=str, index_col=False, encoding='latin-1')

Upvotes: 0

Views: 47

Answers (2)

jezrael
jezrael

Reputation: 862511

In my opinion you can sort datetimes, because your datetimes have wrong ordering.

df1['date']=pd.to_datetime(df1['date'], format='%d.%m.%Y')
df1 = df1.sort_values('date')

df1['month']=df1['date'].dt.strftime('%B')

Another solution if need correct ordering is convert moths to ordered categoricals:

months = ['January','February','March','April','May','June','July','August',
          'September','October','November','December']

df1['month'] = pd.Categorical(df1['date'].dt.strftime('%B'), ordered=True, categories=months)
df1 = df1.sort_values('date')

Upvotes: 0

yatu
yatu

Reputation: 88226

As stated in the documentation of pandas.Series.unique, uniques are returned in order of appearance, and it doesn't seem that any of what you're doing would change the order of the data. I would double check the actual order of the months in the data.

Upvotes: 1

Related Questions