Reputation: 5696
I am trying to read a csv file which includes dates. The csv looks like this:
h1,h2,h3,h4,h5
A,B,C,D,E,20150420
A,B,C,D,E,20150420
A,B,C,D,E,20150420
For reading the csv I use this code:
df = pd.read_csv(filen,
index_col=None,
header=0,
parse_dates=[5],
date_parser=lambda t:parse(t))
The parse function looks like this:
def parse(t):
string_ = str(t)
try:
return datetime.date(int(string_[:4]), int(string_[4:6]), int(string_[6:]))
except:
return datetime.date(1900,1,1)
My strange problem now is that in the parsing function, t
looks like this:
ndarray: ['20150420' '20150420' '20150420']
As you can see t
is the whole array of the data column. I think it should be only the first value when parsing the first row, the second value, when parsing the second row, etc. Right now, the parse
always ends up in the except-block because int(string_[:4])
contains a bracket, which, obviously, cannot be converted to an int. The parse function is built to parse only one date at a time (e.g. 20150420
) in the first place.
What am I doing wrong?
EDIT:
okay, I just read in the pandas doc about the date_parser argument, and it seems to work as expected (of course ;)). So I need to adapt my code to that. My above example is copy&pasted from somewhere else and I expected it to work, hence, my question.. I will report back, when I did my code adaption.
EDIT2:
My parse function now looks like this, and I think, the code works now. If I am still doing something wrong, please let me know:
def parse(t):
ret = []
for ts in t:
string_ = str(ts)
try:
tsdt = datetime.date(int(string_[:4]), int(string_[4:6]), int(string_[6:]))
except:
tsdt = datetime.date(1900,1,1)
ret.append(tsdt)
return ret
Upvotes: 1
Views: 1618
Reputation: 97331
There are six columns, but only five titles in the first line. This is why parse_dates
failed. you can skip the first line:
df = pd.read_csv("tmp.csv", header=None, skiprows=1, parse_dates=[5])
Upvotes: 2
Reputation: 1181
you can try this parser :
parser = lambda x: pd.to_datetime(x, format='%Y%m%d', coerce=True)
and use
df = pd.read_csv(filen,
index_col=None,
header=0,
parse_dates=[5],
date_parser=parser)
Upvotes: 2