Reputation: 1050
I have extracted the table below from a csv file :
timestamp user_id cost val1
01/01/2011 1 1 3
01/07/2012 1 19 57
01/09/2013 1 21 63
01/02/2011 2 20 8
And I for this purpose, I used the following statements : import pandas as pd
newnames = ['date','user_id', 'cost', 'val1']
df = pd.read_csv('expenses.csv', names = newnames, header = False)
df['timestamp'] = pd.to_datetime(df['timestamp'],format='%d%m%Y')
But the dates of table obtained in df are wrong : months and days are permutated without reason, i.e 01/09/2013 becomes 09/01/2013 without reason.
Many thanks in advance for your help.
Upvotes: 6
Views: 9536
Reputation: 2495
Hi it happens sometimes due to US/Europe time parsing compatibilties.
What I follow is,
Case 1
If you are sure that your input file is having the time stamp in a correct format, and you want to use it on top of default parser, like in this case, use (as advised by @filmor) dayfirst option in pd.to_datetime
pd.to_datetime(df['timestamp'], dayfirst=True)
Case 2
When you are not sure about the correctness of the format, use the default, pd.to_datetime(df['timestamp'], dayfirst=True)
Upvotes: 9
Reputation: 32298
It is most certainly not random. Pandas defaults to the US date format and falls back if that doesn't make sense, i.e. "12/3/2014"
becomes 2014-12-03
while "13/3/2014"
will be parsed as 2014-03-13
.
You can pass dayfirst=True
to pd.read_csv
to force European-style date parsing.
Upvotes: 7