ASH
ASH

Reputation: 20322

Convert strings to dates and delete non-dates in one column of a data frame

I have a data frame that I split from one column into two columns, like this.

df_all_files = pd.DataFrame(df_all_files.string.str.split('.',1).tolist(), columns = ['string','the_date'])

That give me a bunch of file extensions and a bunch of dates, all in the same column. I want to coerce anything that looks like it could be a date into an actual date, and delete anything that isn't a date. Is that doable?

Here's before sample of what I have.

                                                                     string     the_date

-rw-r--r--   64 30067    10224         616 Nov 01 17:46 ASEJPN_ModelHolidays    20181101
-rw-r--r--   64 30067    10224         616 Dec 03 19:23 ASEJPN_ModelHolidays    20181201
-rw-r--r--   74 30067    10224        4938 Oct 04 03:28 AS1181003               RATE

This is what I'd like it to look like after.

                                                                     string     the_date

-rw-r--r--   64 30067    10224         616 Nov 01 17:46 ASEJPN_ModelHolidays    20181101
-rw-r--r--   64 30067    10224         616 Dec 03 19:23 ASEJPN_ModelHolidays    20181201
-rw-r--r--   74 30067    10224        4938 Oct 04 03:28 AS1181003               181003

Now, I'm running this one-liner.

df_all_files['the_date'] = df_all_files['the_date'].dt.date

I'm getting this error.

AttributeError: Can only use .dt accessor with datetimelike values

I tried this as well.

df_all_files['the_date'] = df_all_files['string'].astype('datetime64[ns]')

As soon as it encounters a non-date, it give me this error.

ValueError: ('Unknown string format:', 'ach1')

Upvotes: 2

Views: 189

Answers (1)

jpp
jpp

Reputation: 164693

Use pd.to_datetime to convert your series to datetime first, using errors='coerce' to ensure non-convertible values are replaced by NaN:

df_all_files['the_date'] = pd.to_datetime(df_all_files['the_date'], errors='coerce').dt.date

However, I strongly recommend you avoid conversion to dt.date as this converts your series to an object dtype series of datetime.date objects, as opposed to an efficient Pandas datetime series.

Here are 4 examples of problems caused by using Python datetime.date or datetime.datetime objects with Pandas:

  1. TypeError: Cannot compare type 'Timestamp' with type 'date'
  2. Parse a Pandas column to Datetime
  3. Why my code didn't select data from Pandas dataframe?
  4. Filtering pandas dataframe by day

Upvotes: 3

Related Questions