Reputation: 857
This may be formulated into a more general question, but the problem I have is like this: I want to convert a string of dates into datetime objects, but the string contains empty dates. For example:
df = pd.DataFrame({'A': ['2000.02.25', ''], 'B': ['', '2003.05.26']})
I want the returned dataframe to keep the empty dates as NaN
or NaT
. For the sake of speed, I do not want to use pd.to_datetime
, which works otherwise perfectly but is a magnitude slower compared to datetime.datetime
:
df['A'] = [datetime.datetime.strptime(x, '%Y.%m.%d') for x in df['A']]
However, the problem is how to deal with the empty strings. If I include an if len(x) > 0
condition, the returned list will be of different length.
Upvotes: 2
Views: 2149
Reputation: 13259
df['A'] = [datetime.datetime.strptime(x, '%Y.%m.%d') if x else pd.NaT for x in df['A']]
The construct a if cond else b
can be used outside of list comprehensions, it's python's ternary operator.
I'm also a bit surprised df = df.astype(pd.datetime)
doesn't win out, but my guess is that it's hitting an exception on every empty row. This performance hit may be bug-worthy.
Upvotes: 2