Reputation: 35716
I'm trying to convert some date time data in to pandas.to_datetime()
format. It is not working and the type of df['Time']
is Object
. Where is wrong?
Please Note that I have attached my time file.
My Code
import pandas as pd
import numpy as np
from datetime import datetime
f = open('time','r')
lines = f.readlines()
t = []
for line in lines:
time = line.split()[1][-20:]
time2 = time[:11] + ' ' +time[12:21]
t.append(time2)
df = pd.DataFrame(t)
df.columns = ['Time']
df['Time'] = pd.to_datetime(df['Time'])
print df['Time']
Name: Time, Length: 16136, dtype: object
please find the attach time data file here
Upvotes: 2
Views: 2374
Reputation: 369244
The file time
contain some invalid data.
For example, line 8323 contain 8322 "5/Jul/2013::8:25:18 0530"
,
which is different from normal lines 8321 "15/Jul/2013:18:25:18 +0530"
.
8321 "15/Jul/2013:18:25:18 +0530"
8322 "5/Jul/2013::8:25:18 0530"
For normal line, time2
become 15/Jul/2013 18:25:18
, but for invalid line "5/Jul/2013::8:25:18
15/Jul/2013 18:25:18
"5/Jul/2013::8:25:18
Which cause some lines are parsed to datetime, and some lines not; data are coerced to object (to contain both datetime and string).
>>> pd.Series(pd.to_datetime(['15/Jul/2013 18:25:18', '15/Jul/2013 18:25:18']))
0 2013-07-15 18:25:18
1 2013-07-15 18:25:18
dtype: datetime64[ns]
>>> pd.Series(pd.to_datetime(['15/Jul/2013 18:25:18', '*5/Jul/2013 18:25:18']))
0 15/Jul/2013 18:25:18
1 *5/Jul/2013 18:25:18
dtype: object
If you take only first 5 data (which has correct date format) from files, you will get what you expected.
...
df = pd.DataFrame(t[:5])
df.columns = ['Time']
df['Time'] = pd.to_datetime(df['Time'])
Above code yield:
0 2013-07-15 00:00:12
1 2013-07-15 00:00:18
2 2013-07-15 00:00:23
3 2013-07-15 00:00:27
4 2013-07-15 00:00:29
Name: Time, dtype: datetime64[ns]
UPDATE
Added a small example that show the cause of dtype of object
, not datetime
.
Upvotes: 3