Reputation: 1
I am using pandas to read a .csv file. I want to analize the data of this file by month. Originally, the first 5 rows look like:
date value
01.04.2017 208.04
01.04.2017 81
01.04.2017 280
01.04.2017 403.08
01.04.2017 71.1
So I use:
df1['date']=pd.to_datetime(df1['date'], format='%d.%m.%Y')
df1['month']=df1['date'].dt.strftime('%B')
However, when I look at my new month column, I get the following:
print(df1['month'].unique())
>>['April' 'May' 'June' 'July' 'August' 'September' 'January' 'October'
'November' 'December' 'February' 'March']
January comes after September, although in the original .csv the dates are correctly ordered. Does anyone have an idea to solve this, or where does the problem come from? Thank you in advance!
P.S. I import the file with:
df1=pd.read_csv("GF2017_2018.csv", delimiter=';',dtype=str, index_col=False, encoding='latin-1')
Upvotes: 0
Views: 47
Reputation: 862511
In my opinion you can sort datetimes, because your datetimes have wrong ordering.
df1['date']=pd.to_datetime(df1['date'], format='%d.%m.%Y')
df1 = df1.sort_values('date')
df1['month']=df1['date'].dt.strftime('%B')
Another solution if need correct ordering is convert moths to ordered categoricals:
months = ['January','February','March','April','May','June','July','August',
'September','October','November','December']
df1['month'] = pd.Categorical(df1['date'].dt.strftime('%B'), ordered=True, categories=months)
df1 = df1.sort_values('date')
Upvotes: 0
Reputation: 88226
As stated in the documentation of pandas.Series.unique, uniques are returned in order of appearance, and it doesn't seem that any of what you're doing would change the order of the data. I would double check the actual order of the months in the data.
Upvotes: 1