Parse date, time and nanoseconds as datetime objects using pandas

Question

I have ASCII files with a rather odd timestamp:

DATAH   DATE    TIME    SECONDS NANOSECONDS D
DATA    2012-06-04  23:49:15    1338853755  700000000   0.00855577
DATA    2012-06-04  23:49:15    1338853755  800000000   0.00805482
DATA    2012-06-04  23:49:15    1338853755  900000000   -0.00537284
DATA    2012-06-04  23:49:16    1338853756  0   -0.0239447

Basically the timestamp is divided into 4 columns - DATE, TIME, SECONDS and NANOSECONDS. I'd like to read the file as a pandas DataFrame with DATE, TIME and NANOSECONDS as datetime objects, which are used as index:

import datetime as dt
import pandas as pd

parse = lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S %f')

df = pd.read_csv('data.txt', sep='	', parse_dates=[['DATE', 'TIME', 'NANOSECONDS']], index_col=0, date_parser=parse)

But this fails, because nanoseconds values have 9 digits instead of 6 as required by the %f format. The above code works if I manually remove the 3 extra zeroes from the values in NANOSECONDS column. Could you please show me, how I can read-in the sample file as a pandas DataFrame object using DATE, TIME and NANOSECONDS columns as index?

[UPDATE] Using %f000 as suggested by behzad.nouri works if NANOSECONDS column doesn't contain 0 values. So, apparently this is what's causing the problem now.

Jeff · Accepted Answer

This will be much faster that using the read_csv date parser to do this conversion.

In [6]: data = """DATAH   DATE    TIME    SECONDS NANOSECONDS D
   ...: DATA    2012-06-04  23:49:15    1338853755  700000000   0.00855577
   ...: DATA    2012-06-04  23:49:15    1338853755  800000000   0.00805482
   ...: DATA    2012-06-04  23:49:15    1338853755  900000000   -0.00537284
   ...: DATA    2012-06-04  23:49:16    1338853756  0   -0.0239447"""

In [7]: df = read_csv(StringIO(data),sep='\s+')

In [8]: df
Out[8]: 
  DATAH        DATE      TIME     SECONDS  NANOSECONDS         D
0  DATA  2012-06-04  23:49:15  1338853755    700000000  0.008556
1  DATA  2012-06-04  23:49:15  1338853755    800000000  0.008055
2  DATA  2012-06-04  23:49:15  1338853755    900000000 -0.005373
3  DATA  2012-06-04  23:49:16  1338853756            0 -0.023945

[4 rows x 6 columns]

In [9]: df.dtypes
Out[9]: 
DATAH           object
DATE            object
TIME            object
SECONDS          int64
NANOSECONDS      int64
D              float64
dtype: object

In [13]: pd.to_datetime(df['SECONDS']+df['NANOSECONDS'].astype(float)/1e9, unit='s')
Out[13]: 
0   2012-06-04 23:49:15.700000
1   2012-06-04 23:49:15.800000
2   2012-06-04 23:49:15.900000
3          2012-06-04 23:49:16
dtype: datetime64[ns]

Parse date, time and nanoseconds as datetime objects using pandas

Answers (2)

Related Questions