Prevent Pandas read_csv from truncating full timestamp

Question

I'm using Pandas 0.11 on Mac OS X. I'm trying to import a csv file with pandas read_csv, one of the columns in the file is the full timestamp, with values like:

fullts
1374087067.357464
1374087067.256206
1374087067.158231
1374087067.074162

I'm interested in getting the time difference between the subsequent timestamps, so I import it specifying the dtype:

    data = read_csv(fn, dtype={'fullts': float64})

however, pandas seem to truncate the number to its integer part:

    data.fullts.head(4)

yields:

Any suggestions?

Thanks!

Added: tried using pd.to_datetime as suggested, and get this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
 in ()
---> 1 pd.to_datetime(sd1.fullts)

/Users/user/anaconda/lib/python2.7/site-packages/pandas-0.11.0-py2.7-macosx-10.5-x86_64.egg/pandas/tseries/tools.pyc in to_datetime(arg, errors, dayfirst, utc, box, format)
    102         values = arg.values
    103         if not com.is_datetime64_dtype(values):
--> 104             values = _convert_f(values)
    105         return Series(values, index=arg.index, name=arg.name)
    106     elif isinstance(arg, (np.ndarray, list)):

/Users/user/anaconda/lib/python2.7/site-packages/pandas-0.11.0-py2.7-macosx-10.5-x86_64.egg/pandas/tseries/tools.pyc in _convert_f(arg)
     84             else:
     85                 result = tslib.array_to_datetime(arg, raise_=errors == 'raise',
---> 86                                                  utc=utc, dayfirst=dayfirst)
     87             if com.is_datetime64_dtype(result) and box:
     88                 result = DatetimeIndex(result, tz='utc' if utc else None)
/Users/user/anaconda/lib/python2.7/site-packages/pandas-0.11.0-py2.7-macosx-10.5-x86_64.egg/pandas/tslib.so in pandas.tslib.array_to_datetime (pandas/tslib.c:15411)()

TypeError: object of type 'float' has no len()

Andy Hayden · Accepted Answer

You don't need to specify the dtype when reading from csv (it should use float64 by default).

In pandas 0.12 you can then covert columns of integers or floats (of epoch times) into pandas Timestamps using the unit argument of to_datetime:

In [11]: df
Out[11]:
         fullts
0  1.374087e+09
1  1.374087e+09
2  1.374087e+09
3  1.374087e+09

In [12]: pd.to_datetime(df.fullts)  # default unit is ns
Out[12]:
0   1970-01-01 00:00:01.374087067
1   1970-01-01 00:00:01.374087067
2   1970-01-01 00:00:01.374087067
3   1970-01-01 00:00:01.374087067
Name: fullts, dtype: datetime64[ns]

In [13]: pd.to_datetime(df.fullts, unit='s')
Out[13]:
0   2013-07-17 18:51:07.357464
1   2013-07-17 18:51:07.256206
2   2013-07-17 18:51:07.158231
3   2013-07-17 18:51:07.074162
Name: fullts, dtype: datetime64[ns]

Where the docstring state:

unit : unit of the arg (D,s,ms,us,ns) denote the unit in epoch
(e.g. a unix timestamp), which is an integer/float number

Prevent Pandas read_csv from truncating full timestamp

Answers (1)

Related Questions