Reputation: 179
I'm exploring Pandas - trying to learn and apply it. Currently I have a csv file populated with a financial timeseries data of following structure:
date, time, open, high, low, close, volume
2003.04.08,12:00,1.06830,1.06960,1.06670,1.06690,446
2003.04.08,13:00,1.06700,1.06810,1.06570,1.06630,433
2003.04.08,14:00,1.06650,1.06810,1.06510,1.06670,473
2003.04.08,15:00,1.06670,1.06890,1.06630,1.06850,556
2003.04.08,16:00,1.06840,1.07050,1.06610,1.06680,615
Now I want to convert the csv data into a pandas DataFrame object, so that date and time fields merge and become the DateTimeIndex of the DataFrame like this:
df = pa.read_csv(path,
names = ['date', 'time', 'open', 'high', 'low', 'close', 'vol'],
parse_dates = {'dateTime': ['date', 'time']},
index_col = 'dateTime')
This works yielding a nice DataFrame object:
<class 'pandas.core.frame.DataFrame'>
Index: 8676 entries, 2003.04.08 12:00 to nan nan
Data columns (total 5 columns):
open 8675 non-null values
high 8675 non-null values
low 8675 non-null values
close 8675 non-null values
vol 8675 non-null values
dtypes: float64(5)
But upon inspection it turns out that the Index is not a DataTimeIndex but unicode strings instead:
type(df.index)
>>> pandas.core.index.Index
df.index
>>> Index([u'2003.04.08 12:00', u'2003.04.08 13:00', u'2003.04.08 14:00', ....
So read_csv
parsed the date and time fields, merged them but did not create a DateTimeIndex. As far as I understood from the documentation a new datastructure object supplied with a list of datetime objects should automatically create a DateTimeIndex. Am I wrong? Is the DataFrame object an exception?
I also tried to convert the current index like this:
df.index = pa.to_datetime(df.index)
but no changes have been made to the index and it is still in unicode format. I begin to suspect the default parsing functions aren't doing their job, but I don't get any error messages from them.
How to get a working DateTimeIndex in a DateFrame in this situation?
Solution:
df = pa.read_csv(path,
names = ['date', 'time', 'open', 'high', 'low', 'close', 'vol'],
parse_dates={'datetime':['date','time']},
keep_date_col = True,
index_col='datetime'
)
now apply the lambda function, doing what the parser should have done:
df['datetime'] = df.apply(lambda row: datetime.datetime.strptime(row['date']+ ':' + row['time'], '%Y.%m.%d:%H:%M'), axis=1)
Upvotes: 5
Views: 17052
Reputation: 394169
Dateutil is unable to parse your data correctly but you can do it after loading like so using strptime
:
import datetime
df['DateTime'] = df.apply(lambda row: datetime.datetime.strptime(row['date']+ ':' + row['time'], '%Y.%m.%d:%H:%M'), axis=1)
This will yield you the 'DateTime' column as datetime64[ns]
and you can use it as your index
EDIT
Hmm.. interestingly when I do this it works:
df = pd.read_csv(r'c:\data\temp.txt', parse_dates={'datetime':['date','time']}, index_col='datetime')
Could you see what happens when you drop the column names from the parameters to read_csv
Upvotes: 7