Reputation: 165
I have a column where I get the time as epoch time. For example 1359699060370. I have around million rows. Right now I'm using
df['datetime'] = pd.to_datetime(df['Real_First_Packet'], unit = 'ms')
I'm using this '[datetime' column to create new columns like one for date, one for hour so on in the following way.
df['day'] = df['datetime'].dt.day
But pd.to_datetime is returning datetime in GMT. I need it in the localtime format. So I used the following code
df['datetime'] = pd.DatetimeIndex(pd.to_datetime(df['Real_First_Packet'],unit='ms')).tz_localize('UTC').tz_convert('US/Eastern')
This is taking a little bit more time for a million rows. Is there any approach which is better than the above approach.
Upvotes: 2
Views: 95
Reputation: 12038
There's no need to localize to UTC
since that's already provisioned in the to_datetime
default.
Modify your code to the following:
df['datetime'] = pd.to_datetime(df['Real_First_Packet'], utc=True).dt.tz_convert('US/Eastern')
You can increase performance with chunking or caching. At 2M rows, I'd consider utilizing a map/reduce tool such as hadoop or pyspark.
Upvotes: 1