Zhen Sun
Zhen Sun

Reputation: 857

Convert a string of dates with missing values

This may be formulated into a more general question, but the problem I have is like this: I want to convert a string of dates into datetime objects, but the string contains empty dates. For example:

df = pd.DataFrame({'A': ['2000.02.25', ''], 'B': ['', '2003.05.26']})

I want the returned dataframe to keep the empty dates as NaN or NaT. For the sake of speed, I do not want to use pd.to_datetime, which works otherwise perfectly but is a magnitude slower compared to datetime.datetime:

df['A'] = [datetime.datetime.strptime(x, '%Y.%m.%d') for x in df['A']]

However, the problem is how to deal with the empty strings. If I include an if len(x) > 0 condition, the returned list will be of different length.

Upvotes: 2

Views: 2149

Answers (1)

U2EF1
U2EF1

Reputation: 13259

df['A'] = [datetime.datetime.strptime(x, '%Y.%m.%d') if x else pd.NaT for x in df['A']]

The construct a if cond else b can be used outside of list comprehensions, it's python's ternary operator.

I'm also a bit surprised df = df.astype(pd.datetime) doesn't win out, but my guess is that it's hitting an exception on every empty row. This performance hit may be bug-worthy.

Upvotes: 2

Related Questions