Reputation: 1327
I have imported the following data within a CSV file:
I successfully have converted the "object" in the first column to a datetime using:
df= pd.read_csv("myfile.csv",names=['DateTime','Freq'])
df['DateTime'] = pd.to_datetime(df['DateTime'], coerce=True)
The problem is, it's a very big CSV file (35 million rows) and it's dog slow. Is there a more efficient ways of converting the first column to datetime?
I would also like to split the date and the time into separate columns.
Upvotes: 1
Views: 188
Reputation: 91007
Yes, you can do it in the read_csv()
function itself, you can use the argument parse_dates
, and send in the list of columns to parse as date to it. Example -
df= pd.read_csv("myfile.csv",names=['DateTime','Freq'],parse_dates=['DateTime'])
Demo -
In [41]: import io
In [42]: s = """Date, SomeNum
....: 01/01/2014 00:00:00, 50.031
....: 01/01/2014 00:00:01, 50.026
....: 01/01/2014 00:00:02, 50.019
....: 01/01/2014 00:00:03, 50.008"""
In [43]: df = pd.read_csv(io.StringIO(s),parse_dates=['Date'])
In [44]: df
Out[44]:
Date SomeNum
0 2014-01-01 00:00:00 50.031
1 2014-01-01 00:00:01 50.026
2 2014-01-01 00:00:02 50.019
3 2014-01-01 00:00:03 50.008
In [45]: df['Date']
Out[45]:
0 2014-01-01 00:00:00
1 2014-01-01 00:00:01
2 2014-01-01 00:00:02
3 2014-01-01 00:00:03
Name: Date, dtype: datetime64[ns]
Timing results of different methods for a csv with 1 million records -
In [92]: def func1():
....: df = pd.read_csv('a.csv',names=['DateTime','Freq'])
....: df['DateTime'] = pd.to_datetime(df['DateTime'], coerce=True,format='%d/%m/%Y %H:%M:%S')
....: return df
....:
In [96]: def func2():
....: return pd.read_csv('a.csv',names=['DateTime','Freq'],parse_dates=['DateTime'])
....:
In [97]: %timeit func1()
1 loops, best of 3: 6.5 s per loop
In [98]: %timeit func2()
1 loops, best of 3: 652 ms per loop
Upvotes: 1