Reputation: 7997
I have a frequently changing pandas dataframe of data that looks like this:
date name time timezone
0 2016-08-01 aaa 0900 Asia/Tokyo
1 2016-08-04 bbb 1200 Europe/Berlin
2 2016-08-05 ccc 1400 Europe/London
The date, time and timezone refer to a delivery date that is often for an overseas location, with the name being the name of the client company.
The plan is to take this data and create a datetime_local
column that incorporates the time zone shown in the timezone
column of dataframe. Then from that I want to add a column datetime_london
that contains that date and time, but expressed in terms of the time and date in London.
I have got most of the way there but when calling tz_localize
I end up with a ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
which suggests to me that I'm not treating the column with the timezone correctly.
Any suggestions as to how to proceed?
mydf = pd.DataFrame(data={'date':['2016-08-01','2016-08-04','2016-08-05'],
'time':['0900','1200','1400'],
'timezone':['Asia/Tokyo','Europe/Berlin','Europe/London'],
'name':['aaa','bbb','ccc']}
)
print(mydf)
mydf["datetime"] = mydf["date"].map(str) + " " + mydf["time"]
mydf.datetime = pd.to_datetime(mydf.datetime)
mydf.index = mydf.datetime
print(mydf)
mydf["datetime_local"] = mydf.datetime
mydf.datetime_local.tz_localize(mydf.timezone)
Upvotes: 6
Views: 15486
Reputation: 361
While the previous answers work perfectly, when faced with this problem dealing with big data, the apply method was a bit inefficient (for a 10^6 order of magnitude of rows). The apply method processing row by row was taking me 10-15 minutes dealing with that scale. If you have a really small ratio of unique values for the timezone col, compared to the number of rows in your dataframe, this code is a lot more efficient:
for tz in df['timezone'].unique(): #iterates over each unique value of timezone in the dataframe
mask = df['timezone'] == tz #selects all rows with current unique tz value
df.loc[mask,'datetime_local'] = pd.DatetimeIndex(df.loc[mask, 'datetime']).tz_localize('UTC').tz_convert(tz)
The last line of code converts the datetime
column to DatetimeIndex
, which makes that datetime
UTC and tz-naive
(if it wasn't already). Therefore, tz_localize('UTC')
is required, since tz_convert
does not work with tz-naive
datetimes.
I know it's an old question but I really needed a faster solution for this and I think my answer might help people that come after me.
Upvotes: 5
Reputation: 458
import pandas as pd
def convert_to_local_time(row):
return pd.to_datetime(row.datetime).tz_localize(row.timezone)
def convert_to_london_time(row):
return pd.to_datetime(row.datetime_local).tz_convert('Europe/London')
mydf = pd.DataFrame(data={'date':['2016-08-01','2016-08-04','2016-08-05'],
'time':['0900','1200','1400'],
'timezone':['Asia/Tokyo','Europe/Berlin','Europe/ London'],
'name':['aaa','bbb','ccc']}
)
print(mydf)
Output:
date name time timezone
0 2016-08-01 aaa 0900 Asia/Tokyo
1 2016-08-04 bbb 1200 Europe/Berlin
2 2016-08-05 ccc 1400 Europe/London
Add datetime_local
column
mydf["datetime"] = mydf["date"].map(str) + " " + mydf["time"]
mydf['datetime_local'] = mydf.apply(convert_to_local_time, axis=1)
print(mydf)
Output:
date name time timezone datetime \
0 2016-08-01 aaa 0900 Asia/Tokyo 2016-08-01 0900
1 2016-08-04 bbb 1200 Europe/Berlin 2016-08-04 1200
2 2016-08-05 ccc 1400 Europe/London 2016-08-05 1400
datetime_local
0 2016-08-01 09:00:00+09:00
1 2016-08-04 12:00:00+02:00
2 2016-08-05 14:00:00+01:00
Add datetime_london
column
mydf['datetime_london'] = mydf.apply(convert_to_london_time, axis=1)
print('After adding datetime_london:')
print(mydf)
Output:
date name time timezone datetime \
0 2016-08-01 aaa 0900 Asia/Tokyo 2016-08-01 0900
1 2016-08-04 bbb 1200 Europe/Berlin 2016-08-04 1200
2 2016-08-05 ccc 1400 Europe/London 2016-08-05 1400
datetime_local datetime_london
0 2016-08-01 09:00:00+09:00 2016-08-01 01:00:00+01:00
1 2016-08-04 12:00:00+02:00 2016-08-04 11:00:00+01:00
2 2016-08-05 14:00:00+01:00 2016-08-05 14:00:00+01:00
Upvotes: 1
Reputation: 210822
try this:
In [12]: mydf.apply(lambda x: x.datetime_local.tz_localize(x.timezone), axis=1)
Out[12]:
datetime
2016-08-01 09:00:00 2016-08-01 09:00:00+09:00
2016-08-04 12:00:00 2016-08-04 12:00:00+02:00
2016-08-05 14:00:00 2016-08-05 14:00:00+01:00
dtype: object
Upvotes: 1