JL1515
JL1515

Reputation: 431

How to deal with multiple date string formats in a python series

I have a csv file which I am trying to complete operations on. I have created a dataframe with one column titled "start_date" which has the date of warranty start. The problem I have encountered is that the format of the date is not consistent. I would like to know the number of days passed from today's calendar date and the date warranty started for this product.

Two examples of the entries in this start_date series:

9/11/15
9/11/15 0:00

How can I identify each of these formats and treat them accordingly?

Upvotes: 6

Views: 8899

Answers (3)

EJ Lee
EJ Lee

Reputation: 1

Not sure if this will help, but this is what I do when I'm working with Pandas on excel files and want the date format to be 'mm/dd/yyyy' or some other.

writer = pd.ExcelWriter(filename, engine='xlsxwriter', datetime_format='mm/dd/yyyy')
df.to_excel(writer, sheetname)

Maybe it'll work with: df.to_csv

Upvotes: 0

alksdjg
alksdjg

Reputation: 1159

Unfortunately you just have to try each format it might be. If you give an example format, strptime will attempt to parse it for you as discussed here.

The code will end up looking like:

import datetime    

POSSIBLE_DATE_FORMATS = ['%m/%d/%Y', '%Y/%m/%d', etc...] # all the formats the date might be in

for date_format in POSSIBLE_DATE_FORMATS :
    try:
        parsed_date = datetime.strptime(raw_string_date, date_format) # try to get the date
        break # if correct format, don't test any other formats
    except ValueError:
        pass # if incorrect format, keep trying other formats

Upvotes: 13

chill_turner
chill_turner

Reputation: 519

You have a few options really. I'm not entirely sure what happens when you try to directly load the file with a 'pd.read_csv' but as suggested above you can define a set of format strings that you can try to use to parse the data.

One other option would be to read the date column in as a string and then parse it yourself. If you want the column to be like 'YYYY-MM-DD' then parse the string to have just that data and then save it back, something like.

import pandas as prandas
import datetime

df = prandas.read_csv('supa_kewl_data.dis_fmt_rox', dtype={'start_date': str})

print df.head()
# we are interested in start_date

date_strs = df['start_date'].values
#YYYY-MM-DD
#012345678910
filter_date_strs = [x[0:10] for x in date_strs]
df['filter_date_strs] = filter_date_strs

# sometimes i've gotten complained at by pandas for doing this
# try doing df.loc[:,'filter_date_strs'] = filter_date_strs
# if you get some warning thing

# if you want you can convert back to date time using a 
dobjs = [datetime.datetime.strptime(x,'%Y-%m-%d') for x in filter_date_strs]
df['dobj_start_date'] = dobjs

df.to_csv('even_better_data.csv', index=False)

Hopefully this helps! Pandas documentation is sketchy sometimes, looking at the doc in 0.16.2 for read_csv() is intimidating... http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html The library itself is stellar!

Upvotes: 0

Related Questions