Coverting epoch time to local time using pandas

Question

I have a column where I get the time as epoch time. For example 1359699060370. I have around million rows. Right now I'm using

df['datetime'] = pd.to_datetime(df['Real_First_Packet'], unit = 'ms')

I'm using this '[datetime' column to create new columns like one for date, one for hour so on in the following way.

df['day'] = df['datetime'].dt.day

But pd.to_datetime is returning datetime in GMT. I need it in the localtime format. So I used the following code

df['datetime'] = pd.DatetimeIndex(pd.to_datetime(df['Real_First_Packet'],unit='ms')).tz_localize('UTC').tz_convert('US/Eastern')

This is taking a little bit more time for a million rows. Is there any approach which is better than the above approach.

Yaakov Bressler · Accepted Answer

There's no need to localize to UTC since that's already provisioned in the to_datetime default.

Modify your code to the following:

df['datetime'] = pd.to_datetime(df['Real_First_Packet'], utc=True).dt.tz_convert('US/Eastern')

You can increase performance with chunking or caching. At 2M rows, I'd consider utilizing a map/reduce tool such as hadoop or pyspark.

Answers (1)