pandas parse dates from csv

Question

I am trying to read a csv file which includes dates. The csv looks like this:

h1,h2,h3,h4,h5
A,B,C,D,E,20150420
A,B,C,D,E,20150420
A,B,C,D,E,20150420

For reading the csv I use this code:

df = pd.read_csv(filen,
    index_col=None,
    header=0,
    parse_dates=[5],
    date_parser=lambda t:parse(t))

The parse function looks like this:

def parse(t):
    string_ = str(t)
    try:
        return datetime.date(int(string_[:4]), int(string_[4:6]), int(string_[6:]))
    except:
        return datetime.date(1900,1,1)

My strange problem now is that in the parsing function, t looks like this:

ndarray: ['20150420' '20150420' '20150420']

As you can see t is the whole array of the data column. I think it should be only the first value when parsing the first row, the second value, when parsing the second row, etc. Right now, the parse always ends up in the except-block because int(string_[:4]) contains a bracket, which, obviously, cannot be converted to an int. The parse function is built to parse only one date at a time (e.g. 20150420) in the first place.

What am I doing wrong?

EDIT:

okay, I just read in the pandas doc about the date_parser argument, and it seems to work as expected (of course ;)). So I need to adapt my code to that. My above example is copy&pasted from somewhere else and I expected it to work, hence, my question.. I will report back, when I did my code adaption.

EDIT2:

My parse function now looks like this, and I think, the code works now. If I am still doing something wrong, please let me know:

def parse(t):
    ret = []
    for ts in t:
        string_ = str(ts)
        try:
            tsdt = datetime.date(int(string_[:4]), int(string_[4:6]), int(string_[6:]))
        except:
            tsdt = datetime.date(1900,1,1)
        ret.append(tsdt)
    return ret

HYRY · Accepted Answer

There are six columns, but only five titles in the first line. This is why parse_dates failed. you can skip the first line:

df = pd.read_csv("tmp.csv",  header=None, skiprows=1, parse_dates=[5])

pandas parse dates from csv

Answers (2)

Related Questions