Reputation: 2310
I need to do a merge on Timestamps columns but the behaviour depends on wether the timezone is set or not.
The following code works fine
import pandas as pd, datetime
now = datetime.datetime.now()
df1 = pd.DataFrame({'ts': pd.to_datetime([now])})
df2 = pd.DataFrame({'ts': pd.to_datetime([now])})
pd.merge(df1, df2, on='ts')
In contrast, this does not
import pandas as pd, datetime
now = datetime.datetime.now().replace(tzinfo=pytz.utc)
df3 = pd.DataFrame({'ts': pd.to_datetime([now])})
df4 = pd.DataFrame({'ts': pd.to_datetime([now])})
pd.merge(df3, df4, on='ts')
I have the following error
/path/to/env3.4/lib/python3.4/site-packages/pandas/tools/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator)
33 right_index=right_index, sort=sort, suffixes=suffixes,
34 copy=copy, indicator=indicator)
---> 35 return op.get_result()
36 if __debug__:
37 merge.__doc__ = _merge_doc % '\nleft : DataFrame'
/path/to/env3.4/lib/python3.4/site-packages/pandas/tools/merge.py in get_result(self)
194 self.left, self.right = self._indicator_pre_merge(self.left, self.right)
195
--> 196 join_index, left_indexer, right_indexer = self._get_join_info()
197
198 ldata, rdata = self.left._data, self.right._data
/path/to/env3.4/lib/python3.4/site-packages/pandas/tools/merge.py in _get_join_info(self)
323 right_indexer) = _get_join_indexers(self.left_join_keys,
324 self.right_join_keys,
--> 325 sort=self.sort, how=self.how)
326
327 if self.right_index:
/path/to/env3.4/lib/python3.4/site-packages/pandas/tools/merge.py in _get_join_indexers(left_keys, right_keys, sort, how)
514
515 # get left & right join labels and num. of levels at each location
--> 516 llab, rlab, shape = map(list, zip( * map(fkeys, left_keys, right_keys)))
517
518 # get flat i8 keys from label lists
TypeError: type object argument after * must be a sequence, not map
My environment:
dtypes are different:
In [4]: df1.dtypes
Out[4]:
ts datetime64[ns]
dtype: object
In [5]: df3.dtypes
Out[5]:
ts datetime64[ns, UTC]
dtype: object
The handling of date in pandas are a bit cryptic. You have to know :
What am i missing here?
Upvotes: 3
Views: 2737
Reputation: 128948
This was a bug in 0.17.0, fixed in master here, and will be in forthcoming 0.17.1.
w/o tz
In [13]: now = datetime.datetime.now()
In [14]: df1 = pd.DataFrame({'ts': pd.to_datetime([now])})
In [15]: df2 = pd.DataFrame({'ts': pd.to_datetime([now])})
In [16]: pd.merge(df1, df2, on='ts')
Out[16]:
ts
0 2015-11-01 18:33:59.771962
with tz
In [8]: now = datetime.datetime.now().replace(tzinfo=pytz.utc)
In [9]: df3 = pd.DataFrame({'ts': pd.to_datetime([now])})
In [10]: df4 = pd.DataFrame({'ts': pd.to_datetime([now])})
In [11]: pd.merge(df3, df4, on='ts')
Out[11]:
ts
0 2015-11-01 18:32:46.801009+00:00
Upvotes: 2