Zilore Mumba
Zilore Mumba

Reputation: 1570

parsing datestring in pandas

My sincere apologies for asking a question which has already been comprehensively resolved more than once. My problem is that I have tried to copy-paste, as is, examples from SO posts but noe work for me. I have data in the form "dd mm YYYY HH mm ss value value". I want to read this as a pandas dataframe with the first column as a datestring. Here are the examples I tried from SO posts: 1. first sample data

01 05 2013 00 00 00     26.4    16.5
02 05 2013 00 00 00     25.9    17.7
03 05 2013 00 00 00     26.6    17.4
......

The zeroes for HH mm ss were added just to conform exactly to the examples.

  1. Secondly the packages imported

    from datetime import datetime import pandas as pd import os from cStringIO import StringIO

    fname = os.path.expanduser('./temps0.txt')

  2. Now the three examples I tried

    a) from SO post:

    Parse dates when year month day and hour are in separate columns using pandas in python

    def dt_parse(date_string): dt = pd.datetime.strptime(date_string, '%d %m %Y') return dt df = pd.read_csv(fname, header=None, index_col='datetime', parse_dates={'datetime': [0,1,2,4,5,6]}, date_parser=lambda x: pd.datetime.strptime(x, '%d %m %Y')) date_parser=dt_pasre)

and here is part of the error

File "/home/zmumba/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py",
 line 1763, in _try_convert_dates
    colnames.append(str(columns[c]))
IndexError: list index out of range

b) Another try, from SO post:

https://stackoverflow.com/questions/11615504/parse-dates-when-yyyymmdd-and-hh-are-in-separate-columns-using-pandas-in-python

parse = lambda x: datetime.strptime(x, '%d%m%Y')
pd.read_table("./temps0.txt",  parse_dates = [['DD MM YYYY HH mm ss']],
                               index_col = 0,
                               date_parser=parse)

and here is part of the error

File "/home/zmumba/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py",
 line 1009, in _set
    self._reader.set_noconvert(names.index(x))
ValueError: 'DD MM YYYY HH mm ss' is not in list

c) Another try, from SO post:

https://stackoverflow.com/questions/17301589/parsing-dd-mm-yy-hh-mm-ss-columns-from-txt-file-using-pythons-pandas?lq=1
def date_parser(ss):
    day, month, year, hour, min, sec = ss.split()
    return pd.Timestamp('20%s-%s-%s %s:%s:%s' % (year, month, day, hour, min, sec))
df = pd.read_csv('temps0.txt', header=None, sep='\s+\s', parse_dates=[[0]], date_parser=date_parser)

and here is part of the error

File "<stdin>", line 2, in date_parser
ValueError: too many values to unpack

Please excuse my ignorance as am trying to learn mostly through SO posts (the official documentation lacking completely in examples for clarity).

Upvotes: 2

Views: 7660

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375925

In Chang's answer he used a parser, which is the bit you need to tweak to match the date format of your date strings:

import datetime
parse = lambda x: datetime.strptime(x, '%d %m %Y %H %M %S')

You also need to tweak the column names, which in that example were similar to the format, which was a bit confusing. Here we have no columns names so we can use numbers (for the position of the column, and tell read_csv that using header=None):

from StringIO import StringIO
csv = '''01 05 2013 00 00 00,26.4,16.5
02 05 2013 00 00 00,25.9,17.7
03 05 2013 00 00 00,26.6,17.4'''

Note: we use header=None, since there are no column names.

df = pd.read_csv(StringIO(csv), parse_dates=[0],
                                index_col=0,
                                date_parser=parse,
                                header=None)
In [11]: df
Out[11]:
               1     2
0                     
2013-05-01  26.4  16.5
2013-05-02  25.9  17.7
2013-05-03  26.6  17.4

If your data is separated by multiple spaces (more than two) rather than commas, then use the sep argument:

csv = '''01 05 2013 00 00 00     26.4    16.5
02 05 2013 00 00 00     25.9    17.7
03 05 2013 00 00 00     26.6    17.4'''

In [21]: pd.read_csv(StringIO(csv), parse_dates = [0],
                           index_col = 0,
                           date_parser=parse,
                           header=None, sep='\s\s+')
Out[21]: 
               1     2
0                     
2013-05-01  26.4  16.5
2013-05-02  25.9  17.7
2013-05-03  26.6  17.4

Note: the io docs go into a lot of detail, with many examples.

Upvotes: 5

Related Questions