Pandas DateTimeIndex

Question

I need to have a DateTimeIndex for my dataframe. Problem is my source file. The Date header is Date(dd-mm-yy), but the actual date data has the format dd:mm:yy (24:06:1970) etc. I have lots of source files so manually changing the header would be tedious and not good programing practice. How would one go about addressing this from within python?

Perhaps creating a copy of the source file opening it, searching for the date header, changing it and then closing it? I'm new to python so I'm not exactly sure if this is the best way to go about doing things and if it is, how do I implement such code?

Currently I have this;

df = pd.read_csv('test.csv',
                    skiprows = 4,
                    parse_dates = {'stamp':[0,1]},
                    na_values = 'NaN',                    
                    index_col = 'stamp'
                 )

Where column 0 is the date column in question and column 1 is the time column. I don't get any error messages just erroneous data.

Sorry, I should have added a snippet of the csv file in question.I've now provided it below;

some stuff I dont want 
some stuff I dont want 
some stuff I dont want 
some stuff I dont want 
Date(dd-mm-yy),Time(hh:mm:ss),Julian_Day
01:07:2013,05:40:41,182.236586,659,1638.400000
01:07:2013,05:44:03,182.238924,659,1638.400000
01:07:2013,05:47:48,182.241528,659,1638.400000
01:07:2013,05:52:21,182.244687,659,1638.400000

Laurence Billingham · Accepted Answer

I think the main problem is that the header line Date(dd-mm-yy), Time(hh:mm:ss), Julian_Day only appears to specify some of the column names. Pandas cannot infer what to do with the other data.

Try skipping the file's column name line and passing pandas a list of column names and defining your own date_parser:

def my_parser(date, time):
    import datetime
    DATE_FORMAT = '%d:%m:%Y'
    TIME_FORMAT = '%H:%M:%S'
    date = datetime.datetime.strptime(date, DATE_FORMAT)
    time_weird_date = datetime.datetime.strptime(time, TIME_FORMAT)
    return datetime.datetime.combine(date, time_weird_date.time())

import pandas as pd
from cStringIO import StringIO

data = """\
some stuff I dont want 
some stuff I dont want 
some stuff I dont want 
some stuff I dont want 
Date(dd-mm-yy),Time(hh:mm:ss),Julian_Day
01:07:2013,05:40:41,182.236586,659,1638.400000
01:07:2013,05:44:03,182.238924,659,1638.400000
01:07:2013,05:47:48,182.241528,659,1638.400000
01:07:2013,05:52:21,182.244687,659,1638.400000
"""  

pd.read_csv(StringIO(data), skiprows=5, index_col=0, 
            parse_dates={'datetime':['date', 'time']}, 
            names=['date','time', 'Julian_Day', 'col_2', 'col_3'],
            date_parser=my_parser)

This should give you what you want.

As you said you are new to python, I should add that the from cStringIO import StringIO, data = """..., and StringIO(data) parts are just so I could include the data directly in this answer in a runnable form. You just need pd.read_csv(my_data_filename, ... in your own code

Pandas DateTimeIndex

Answers (2)

Related Questions