Reputation: 1570
My sincere apologies for asking a question which has already been comprehensively resolved more than once. My problem is that I have tried to copy-paste, as is, examples from SO posts but noe work for me. I have data in the form "dd mm YYYY HH mm ss value value". I want to read this as a pandas dataframe with the first column as a datestring. Here are the examples I tried from SO posts: 1. first sample data
01 05 2013 00 00 00 26.4 16.5
02 05 2013 00 00 00 25.9 17.7
03 05 2013 00 00 00 26.6 17.4
......
The zeroes for HH mm ss were added just to conform exactly to the examples.
Secondly the packages imported
from datetime import datetime import pandas as pd import os from cStringIO import StringIO
fname = os.path.expanduser('./temps0.txt')
Now the three examples I tried
a) from SO post:
Parse dates when year month day and hour are in separate columns using pandas in python
def dt_parse(date_string): dt = pd.datetime.strptime(date_string, '%d %m %Y') return dt df = pd.read_csv(fname, header=None, index_col='datetime', parse_dates={'datetime': [0,1,2,4,5,6]}, date_parser=lambda x: pd.datetime.strptime(x, '%d %m %Y')) date_parser=dt_pasre)
and here is part of the error
File "/home/zmumba/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py",
line 1763, in _try_convert_dates
colnames.append(str(columns[c]))
IndexError: list index out of range
b) Another try, from SO post:
https://stackoverflow.com/questions/11615504/parse-dates-when-yyyymmdd-and-hh-are-in-separate-columns-using-pandas-in-python
parse = lambda x: datetime.strptime(x, '%d%m%Y')
pd.read_table("./temps0.txt", parse_dates = [['DD MM YYYY HH mm ss']],
index_col = 0,
date_parser=parse)
and here is part of the error
File "/home/zmumba/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py",
line 1009, in _set
self._reader.set_noconvert(names.index(x))
ValueError: 'DD MM YYYY HH mm ss' is not in list
c) Another try, from SO post:
https://stackoverflow.com/questions/17301589/parsing-dd-mm-yy-hh-mm-ss-columns-from-txt-file-using-pythons-pandas?lq=1
def date_parser(ss):
day, month, year, hour, min, sec = ss.split()
return pd.Timestamp('20%s-%s-%s %s:%s:%s' % (year, month, day, hour, min, sec))
df = pd.read_csv('temps0.txt', header=None, sep='\s+\s', parse_dates=[[0]], date_parser=date_parser)
and here is part of the error
File "<stdin>", line 2, in date_parser
ValueError: too many values to unpack
Please excuse my ignorance as am trying to learn mostly through SO posts (the official documentation lacking completely in examples for clarity).
Upvotes: 2
Views: 7660
Reputation: 375925
In Chang's answer he used a parser, which is the bit you need to tweak to match the date format of your date strings:
import datetime
parse = lambda x: datetime.strptime(x, '%d %m %Y %H %M %S')
You also need to tweak the column names, which in that example were similar to the format, which was a bit confusing. Here we have no columns names so we can use numbers (for the position of the column, and tell read_csv that using header=None):
from StringIO import StringIO
csv = '''01 05 2013 00 00 00,26.4,16.5
02 05 2013 00 00 00,25.9,17.7
03 05 2013 00 00 00,26.6,17.4'''
Note: we use header=None, since there are no column names.
df = pd.read_csv(StringIO(csv), parse_dates=[0],
index_col=0,
date_parser=parse,
header=None)
In [11]: df
Out[11]:
1 2
0
2013-05-01 26.4 16.5
2013-05-02 25.9 17.7
2013-05-03 26.6 17.4
If your data is separated by multiple spaces (more than two) rather than commas, then use the sep argument:
csv = '''01 05 2013 00 00 00 26.4 16.5
02 05 2013 00 00 00 25.9 17.7
03 05 2013 00 00 00 26.6 17.4'''
In [21]: pd.read_csv(StringIO(csv), parse_dates = [0],
index_col = 0,
date_parser=parse,
header=None, sep='\s\s+')
Out[21]:
1 2
0
2013-05-01 26.4 16.5
2013-05-02 25.9 17.7
2013-05-03 26.6 17.4
Note: the io docs go into a lot of detail, with many examples.
Upvotes: 5