Reputation: 20322
I have a data frame that I split from one column into two columns, like this.
df_all_files = pd.DataFrame(df_all_files.string.str.split('.',1).tolist(), columns = ['string','the_date'])
That give me a bunch of file extensions and a bunch of dates, all in the same column. I want to coerce anything that looks like it could be a date into an actual date, and delete anything that isn't a date. Is that doable?
Here's before sample of what I have.
string the_date
-rw-r--r-- 64 30067 10224 616 Nov 01 17:46 ASEJPN_ModelHolidays 20181101
-rw-r--r-- 64 30067 10224 616 Dec 03 19:23 ASEJPN_ModelHolidays 20181201
-rw-r--r-- 74 30067 10224 4938 Oct 04 03:28 AS1181003 RATE
This is what I'd like it to look like after.
string the_date
-rw-r--r-- 64 30067 10224 616 Nov 01 17:46 ASEJPN_ModelHolidays 20181101
-rw-r--r-- 64 30067 10224 616 Dec 03 19:23 ASEJPN_ModelHolidays 20181201
-rw-r--r-- 74 30067 10224 4938 Oct 04 03:28 AS1181003 181003
Now, I'm running this one-liner.
df_all_files['the_date'] = df_all_files['the_date'].dt.date
I'm getting this error.
AttributeError: Can only use .dt accessor with datetimelike values
I tried this as well.
df_all_files['the_date'] = df_all_files['string'].astype('datetime64[ns]')
As soon as it encounters a non-date, it give me this error.
ValueError: ('Unknown string format:', 'ach1')
Upvotes: 2
Views: 189
Reputation: 164693
Use pd.to_datetime
to convert your series to datetime
first, using errors='coerce'
to ensure non-convertible values are replaced by NaN
:
df_all_files['the_date'] = pd.to_datetime(df_all_files['the_date'], errors='coerce').dt.date
However, I strongly recommend you avoid conversion to dt.date
as this converts your series to an object
dtype series of datetime.date
objects, as opposed to an efficient Pandas datetime
series.
Here are 4 examples of problems caused by using Python datetime.date
or datetime.datetime
objects with Pandas:
Upvotes: 3