beta
beta

Reputation: 5696

pandas parse dates from csv

I am trying to read a csv file which includes dates. The csv looks like this:

h1,h2,h3,h4,h5
A,B,C,D,E,20150420
A,B,C,D,E,20150420
A,B,C,D,E,20150420

For reading the csv I use this code:

df = pd.read_csv(filen,
    index_col=None,
    header=0,
    parse_dates=[5],
    date_parser=lambda t:parse(t))

The parse function looks like this:

def parse(t):
    string_ = str(t)
    try:
        return datetime.date(int(string_[:4]), int(string_[4:6]), int(string_[6:]))
    except:
        return datetime.date(1900,1,1)

My strange problem now is that in the parsing function, t looks like this:

ndarray: ['20150420' '20150420' '20150420']

As you can see t is the whole array of the data column. I think it should be only the first value when parsing the first row, the second value, when parsing the second row, etc. Right now, the parse always ends up in the except-block because int(string_[:4]) contains a bracket, which, obviously, cannot be converted to an int. The parse function is built to parse only one date at a time (e.g. 20150420) in the first place.

What am I doing wrong?

EDIT:

okay, I just read in the pandas doc about the date_parser argument, and it seems to work as expected (of course ;)). So I need to adapt my code to that. My above example is copy&pasted from somewhere else and I expected it to work, hence, my question.. I will report back, when I did my code adaption.

EDIT2:

My parse function now looks like this, and I think, the code works now. If I am still doing something wrong, please let me know:

def parse(t):
    ret = []
    for ts in t:
        string_ = str(ts)
        try:
            tsdt = datetime.date(int(string_[:4]), int(string_[4:6]), int(string_[6:]))
        except:
            tsdt = datetime.date(1900,1,1)
        ret.append(tsdt)
    return ret

Upvotes: 1

Views: 1618

Answers (2)

HYRY
HYRY

Reputation: 97331

There are six columns, but only five titles in the first line. This is why parse_dates failed. you can skip the first line:

df = pd.read_csv("tmp.csv",  header=None, skiprows=1, parse_dates=[5])

Upvotes: 2

steboc
steboc

Reputation: 1181

you can try this parser :

parser = lambda x: pd.to_datetime(x, format='%Y%m%d', coerce=True)

and use

df = pd.read_csv(filen,
index_col=None,
header=0,
parse_dates=[5],
date_parser=parser)

Upvotes: 2

Related Questions