JA-pythonista
JA-pythonista

Reputation: 1323

Match datetime YYYY-MM-DD object in pandas dataframe

I have a pandas DataFrame of the form:

    id     amount           birth
0   4      78.0      1980-02-02 00:00:00
1   5      24.0      1989-03-03 00:00:00
2   6      49.5      2014-01-01 00:00:00
3   7      34.0      2014-01-01 00:00:00
4   8      49.5      2014-01-01 00:00:00

I am interested in only the year, month and day in the birth column of the dataframe. I tried to leverage on the Python datetime from pandas but it resulted into an error:

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1054-02-07 00:00:00

The birth column is an object dtype.

My guess would be that it is an incorrect date. I would not like to pass the parameter errors="coerce" into the to_datetime method, because each item is important and I need just the YYYY-MM-DD.

I tried to leverage on the regex from pandas:

df["birth"].str.find("(\d{4})-(\d{2})-(\d{2})")

But this is returning NANs. How can I resolve this?

Thanks

Upvotes: 1

Views: 722

Answers (1)

jezrael
jezrael

Reputation: 862581

Because not possible convert to datetimes you can use split by first whitespace and then select first value:

df['birth'] = df['birth'].str.split().str[0]

And then if necessary convert to periods.

Representing out-of-bounds spans.

print (df)
   id  amount                birth
0   4    78.0  1980-02-02 00:00:00
1   5    24.0  1989-03-03 00:00:00
2   6    49.5  2014-01-01 00:00:00
3   7    34.0  2014-01-01 00:00:00
4   8    49.5     0-01-01 00:00:00

def to_per(x):
    splitted = x.split('-')
    return pd.Period(year=int(splitted[0]), 
                     month=int(splitted[1]), 
                     day=int(splitted[2]), freq='D')

df['birth'] = df['birth'].str.split().str[0].apply(to_per)

print (df)
   id  amount       birth
0   4    78.0  1980-02-02
1   5    24.0  1989-03-03
2   6    49.5  2014-01-01
3   7    34.0  2014-01-01
4   8    49.5  0000-01-01

Upvotes: 1

Related Questions