SlowLearner
SlowLearner

Reputation: 7997

Setting timezone for timestamp data in pandas dataframe

I have a frequently changing pandas dataframe of data that looks like this:

         date name  time       timezone
0  2016-08-01  aaa  0900     Asia/Tokyo
1  2016-08-04  bbb  1200  Europe/Berlin
2  2016-08-05  ccc  1400  Europe/London

The date, time and timezone refer to a delivery date that is often for an overseas location, with the name being the name of the client company.

The plan is to take this data and create a datetime_local column that incorporates the time zone shown in the timezone column of dataframe. Then from that I want to add a column datetime_london that contains that date and time, but expressed in terms of the time and date in London.

I have got most of the way there but when calling tz_localize I end up with a ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() which suggests to me that I'm not treating the column with the timezone correctly.

Any suggestions as to how to proceed?

mydf = pd.DataFrame(data={'date':['2016-08-01','2016-08-04','2016-08-05'],
                          'time':['0900','1200','1400'],
                          'timezone':['Asia/Tokyo','Europe/Berlin','Europe/London'],
                          'name':['aaa','bbb','ccc']}
)
print(mydf)
mydf["datetime"] = mydf["date"].map(str) + " " + mydf["time"]
mydf.datetime = pd.to_datetime(mydf.datetime)
mydf.index = mydf.datetime
print(mydf)
mydf["datetime_local"] = mydf.datetime
mydf.datetime_local.tz_localize(mydf.timezone)

Upvotes: 6

Views: 15486

Answers (3)

Felipe Muniz
Felipe Muniz

Reputation: 361

While the previous answers work perfectly, when faced with this problem dealing with big data, the apply method was a bit inefficient (for a 10^6 order of magnitude of rows). The apply method processing row by row was taking me 10-15 minutes dealing with that scale. If you have a really small ratio of unique values for the timezone col, compared to the number of rows in your dataframe, this code is a lot more efficient:

for tz in df['timezone'].unique():    #iterates over each unique value of timezone in the dataframe
    mask = df['timezone'] == tz       #selects all rows with current unique tz value
    df.loc[mask,'datetime_local'] = pd.DatetimeIndex(df.loc[mask, 'datetime']).tz_localize('UTC').tz_convert(tz)

The last line of code converts the datetime column to DatetimeIndex, which makes that datetime UTC and tz-naive (if it wasn't already). Therefore, tz_localize('UTC') is required, since tz_convert does not work with tz-naive datetimes.

I know it's an old question but I really needed a faster solution for this and I think my answer might help people that come after me.

Upvotes: 5

Max
Max

Reputation: 458

import pandas as pd

def convert_to_local_time(row):
    return pd.to_datetime(row.datetime).tz_localize(row.timezone)

def convert_to_london_time(row):
    return pd.to_datetime(row.datetime_local).tz_convert('Europe/London')

mydf = pd.DataFrame(data={'date':['2016-08-01','2016-08-04','2016-08-05'],
                          'time':['0900','1200','1400'],
                          'timezone':['Asia/Tokyo','Europe/Berlin','Europe/    London'],
                          'name':['aaa','bbb','ccc']}
)
print(mydf)

Output:

         date name  time       timezone
0  2016-08-01  aaa  0900     Asia/Tokyo
1  2016-08-04  bbb  1200  Europe/Berlin
2  2016-08-05  ccc  1400  Europe/London

Add datetime_local column

mydf["datetime"] = mydf["date"].map(str) + " " + mydf["time"]
mydf['datetime_local'] = mydf.apply(convert_to_local_time, axis=1)
print(mydf)

Output:

         date name  time       timezone         datetime  \
0  2016-08-01  aaa  0900     Asia/Tokyo  2016-08-01 0900   
1  2016-08-04  bbb  1200  Europe/Berlin  2016-08-04 1200   
2  2016-08-05  ccc  1400  Europe/London  2016-08-05 1400   

              datetime_local  
0  2016-08-01 09:00:00+09:00  
1  2016-08-04 12:00:00+02:00  
2  2016-08-05 14:00:00+01:00  

Add datetime_london column

mydf['datetime_london'] = mydf.apply(convert_to_london_time, axis=1)
print('After adding datetime_london:')
print(mydf)

Output:

         date name  time       timezone         datetime  \
0  2016-08-01  aaa  0900     Asia/Tokyo  2016-08-01 0900   
1  2016-08-04  bbb  1200  Europe/Berlin  2016-08-04 1200   
2  2016-08-05  ccc  1400  Europe/London  2016-08-05 1400   

              datetime_local           datetime_london  
0  2016-08-01 09:00:00+09:00 2016-08-01 01:00:00+01:00  
1  2016-08-04 12:00:00+02:00 2016-08-04 11:00:00+01:00  
2  2016-08-05 14:00:00+01:00 2016-08-05 14:00:00+01:00  

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210822

try this:

In [12]: mydf.apply(lambda x: x.datetime_local.tz_localize(x.timezone), axis=1)
Out[12]:
datetime
2016-08-01 09:00:00    2016-08-01 09:00:00+09:00
2016-08-04 12:00:00    2016-08-04 12:00:00+02:00
2016-08-05 14:00:00    2016-08-05 14:00:00+01:00
dtype: object

Upvotes: 1

Related Questions