Space
Space

Reputation: 1050

pd.to_datetime change date format producing wrong dates

I have extracted the table below from a csv file :

timestamp   user_id   cost   val1
01/01/2011     1     1     3
01/07/2012     1     19     57
01/09/2013     1     21     63
01/02/2011     2     20     8

And I for this purpose, I used the following statements : import pandas as pd

newnames = ['date','user_id', 'cost', 'val1']
df = pd.read_csv('expenses.csv', names = newnames, header = False)
df['timestamp'] = pd.to_datetime(df['timestamp'],format='%d%m%Y')

But the dates of table obtained in df are wrong : months and days are permutated without reason, i.e 01/09/2013 becomes 09/01/2013 without reason.

Many thanks in advance for your help.

Upvotes: 6

Views: 9536

Answers (2)

Pranzell
Pranzell

Reputation: 2495

Hi it happens sometimes due to US/Europe time parsing compatibilties.

What I follow is,

Case 1

If you are sure that your input file is having the time stamp in a correct format, and you want to use it on top of default parser, like in this case, use (as advised by @filmor) dayfirst option in pd.to_datetime

pd.to_datetime(df['timestamp'], dayfirst=True)

Case 2

When you are not sure about the correctness of the format, use the default, pd.to_datetime(df['timestamp'], dayfirst=True)

Upvotes: 9

filmor
filmor

Reputation: 32298

It is most certainly not random. Pandas defaults to the US date format and falls back if that doesn't make sense, i.e. "12/3/2014" becomes 2014-12-03 while "13/3/2014" will be parsed as 2014-03-13.

You can pass dayfirst=True to pd.read_csv to force European-style date parsing.

Upvotes: 7

Related Questions