Puffin
Puffin

Reputation: 141

Pandas to_datetime doesn't work as hoped with format %d.%m.%Y

I have a pandas dataframe with a text column containing strings in the format:

28.08.1958
29.04.1958
01.02.1958
05.03.1958

that I want to interpret as dates. The dataframe arises from using beautifulsoup, i.e. I've not read it in from csv, so I planned to use pd.to_datetime(). There are occasional non-date entries so I added the errors='ignore'.

df["Date2"] = pd.to_datetime(df["Date"], format='%d.%m.%Y', errors='ignore')

It looks to me as if this isn't working as I have used a subsequent sort operation:

df.sort_values(by="Date2", ascending=True)

and this does change the order but seemingly randomly, not into date order. I wondered if there might be whitespace amongst the dates so as a precaution I used:

['Date'].str.strip()

but no improvement.

I also tried adding:

inplace=True

to the sort values but this results in the whole column being sorted by the days part of the date, which is really telling me that there is no conversion to date going on.

In summary I presume that all the input strings are being treated as errors and ignored. Perhaps this means that the parameter format='%d.%m.%Y' isn't right.

EDIT In response to the comments/answer so far I've found a way of inducing an input dataset that has no errors in it. This seems to sort ok. What's more I can see that the data type of the "Date2" column depends upon whether there are errors in that column: if there is non-date text then the column is of type object, if no errors it is datetime64[ns]

Solution I've set errors='coerce' within the to_datetime statement.

Upvotes: 2

Views: 3802

Answers (1)

I'mahdi
I'mahdi

Reputation: 24049

Try this:

import pandas as pd
import io

csv_data = '''                  
Date
28.08.1958
29.04.1958
01.02.1958
05.03.1958
'''
df = pd.read_csv(io.StringIO(csv_data))
df["Date2"] = pd.to_datetime(df["Date"], format='%d.%m.%Y')
df.sort_values(by="Date2", ascending=True, inplace=True)
print(df)

         Date      Date2
2  01.02.1958 1958-02-01
3  05.03.1958 1958-03-05
1  29.04.1958 1958-04-29
0  28.08.1958 1958-08-28

Upvotes: 2

Related Questions