Reputation: 972
I currently parse a text file with the following:
f = lambda s: datetime.datetime.strptime(s, '%Y-%m-%d-%H-%M-%S')
dframe = pd.read_csv(
fname, sep=' ', header=None,
names=('A', 'B', 'C', 'D', 'E'),
use_unsigned=True, parse_dates=True, index_col=0, date_parser=f)
which takes about 5.70 s for a single file.
Can I speedup the datetime parsing?
A line from the file looks like:
2015-04-08-11-23-27 12420.8 12430.3 12527.0 12394.2 A
Thanks,
Upvotes: 2
Views: 843
Reputation: 353479
You should be able to speed it up a bit by using to_datetime
manually instead of using your lambda function:
>>> %time df = pd.read_csv(fname, delim_whitespace=True, header=None,
names=('A', 'B', 'C', 'D', 'E'), use_unsigned=True, parse_dates=True,
index_col=0, date_parser=f)
CPU times: user 9.16 s, sys: 39.9 ms, total: 9.2 s
Wall time: 9.2 s
vs.
>>> %time df2 = pd.read_csv(fname, delim_whitespace=True, header=None, names=('A', 'B', 'C', 'D', 'E'), use_unsigned=True, parse_dates=False, index_col=0)
CPU times: user 416 ms, sys: 20 ms, total: 436 ms
Wall time: 435 ms
>>> %time df2.index = pd.to_datetime(df2.index, format="%Y-%m-%d-%H-%M-%S")
CPU times: user 2.72 s, sys: 4 ms, total: 2.72 s
Wall time: 2.72 s
>>>
>>> df.equals(df2)
True
>>> (2.72+0.435)/9.2
0.3429347826086957
(I'm using delim_whitespace=True
because that tends to be modestly faster in situations like this.)
Upvotes: 3