serega
serega

Reputation: 405

pandas to_datetime from milliseconds produces incorrect datetime

A data frame with a column dateMillis that contains timestamp as milliseconds from epoch

 In [5]: df.head(5)
    Out[5]: 
       custId     dateMillis
    0   31403  1488232800000
    1   28197  1488232800013
    2   30158  1488232800026
    3   28538  1488232800030
    4   28538  1488232800033

If I convert the value of dateMillis in the first row to datetime using python's datetime I get the following

 In [8]: print datetime.datetime.fromtimestamp(df.ix[0]['dateMillis']/1000.0)
    2017-02-27 17:00:00

which is correct. But if I convert the column dateMillis using panda's to_datetime I get

In [11]: df['date'] = pd.to_datetime(df['dateMillis'], unit='ms')
In [12]: df.head(5)
Out[12]: 
   custId     dateMillis                    date
0   31403  1488232800000 2017-02-27 22:00:00.000
1   28197  1488232800013 2017-02-27 22:00:00.013
2   30158  1488232800026 2017-02-27 22:00:00.026
3   28538  1488232800030 2017-02-27 22:00:00.030
4   28538  1488232800033 2017-02-27 22:00:00.033

which is five hours ahead. The machine itself has EST time zone, so it seems like pandas converts datetime to the local time zone.

 In [15]: time.tzname
    Out[15]: ('EST', 'EDT')

I would like to_datetime to be consistent with datetime.fromtimestamp. I tried various values (True/False/None) for utc argument, but it didn't help.

Upvotes: 2

Views: 3760

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210882

UNIX timestamp (epoch) per default doesn't know anything of any time zone.

"This count starts at the Unix Epoch on January 1st, 1970 at UTC`.

But you can use the following workaround:

In [47]: datetime.datetime.fromtimestamp(df.loc[0, 'dateMillis']/1000)
Out[47]: datetime.datetime(2017, 2, 27, 23, 0)   # 23:00:00

In UTC TZ:

In [48]: pd.to_datetime(df['dateMillis'], unit='ms')
Out[48]:
0   2017-02-27 22:00:00.000
1   2017-02-27 22:00:00.013
2   2017-02-27 22:00:00.026
3   2017-02-27 22:00:00.030
4   2017-02-27 22:00:00.033
Name: dateMillis, dtype: datetime64[ns]

Solution:

In [51]: from tzlocal import get_localzone

In [52]: mytz = get_localzone()

In [53]: mytz
Out[53]: <DstTzInfo 'Europe/Berlin' LMT+0:53:00 STD>

In [54]: pd.to_datetime(df['dateMillis'], unit='ms').dt.tz_localize('UTC').dt.tz_convert(mytz)
Out[54]:
0          2017-02-27 23:00:00+01:00
1   2017-02-27 23:00:00.013000+01:00
2   2017-02-27 23:00:00.026000+01:00
3   2017-02-27 23:00:00.030000+01:00
4   2017-02-27 23:00:00.033000+01:00
Name: dateMillis, dtype: datetime64[ns, Europe/Berlin]

Upvotes: 3

Related Questions