Guillaume Thomas
Guillaume Thomas

Reputation: 2310

pandas.merge fails when merging on timestamps columns with tzinfo

I need to do a merge on Timestamps columns but the behaviour depends on wether the timezone is set or not.

The following code works fine

import pandas as pd, datetime
now = datetime.datetime.now()
df1 = pd.DataFrame({'ts': pd.to_datetime([now])})
df2 = pd.DataFrame({'ts': pd.to_datetime([now])})
pd.merge(df1, df2, on='ts')

In contrast, this does not

import pandas as pd, datetime
now = datetime.datetime.now().replace(tzinfo=pytz.utc)
df3 = pd.DataFrame({'ts': pd.to_datetime([now])})
df4 = pd.DataFrame({'ts': pd.to_datetime([now])})
pd.merge(df3, df4, on='ts')

I have the following error

/path/to/env3.4/lib/python3.4/site-packages/pandas/tools/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator)
     33                          right_index=right_index, sort=sort, suffixes=suffixes,
     34                          copy=copy, indicator=indicator)
---> 35     return op.get_result()
     36 if __debug__:
     37     merge.__doc__ = _merge_doc % '\nleft : DataFrame'

/path/to/env3.4/lib/python3.4/site-packages/pandas/tools/merge.py in get_result(self)
    194             self.left, self.right = self._indicator_pre_merge(self.left, self.right)
    195 
--> 196         join_index, left_indexer, right_indexer = self._get_join_info()
    197 
    198         ldata, rdata = self.left._data, self.right._data

/path/to/env3.4/lib/python3.4/site-packages/pandas/tools/merge.py in _get_join_info(self)
    323              right_indexer) = _get_join_indexers(self.left_join_keys,
    324                                                  self.right_join_keys,
--> 325                                                  sort=self.sort, how=self.how)
    326 
    327             if self.right_index:

/path/to/env3.4/lib/python3.4/site-packages/pandas/tools/merge.py in _get_join_indexers(left_keys, right_keys, sort, how)
    514 
    515     # get left & right join labels and num. of levels at each location
--> 516     llab, rlab, shape = map(list, zip( * map(fkeys, left_keys, right_keys)))
    517 
    518     # get flat i8 keys from label lists

TypeError: type object argument after * must be a sequence, not map

My environment:

  1. python 3.4
  2. pandas 0.17.0
  3. numpy 1.10.1

dtypes are different:

In [4]: df1.dtypes
Out[4]: 
ts    datetime64[ns]
dtype: object

In [5]: df3.dtypes
Out[5]: 
ts    datetime64[ns, UTC]
dtype: object

The handling of date in pandas are a bit cryptic. You have to know :

What am i missing here?

Upvotes: 3

Views: 2737

Answers (1)

Jeff
Jeff

Reputation: 128948

This was a bug in 0.17.0, fixed in master here, and will be in forthcoming 0.17.1.

w/o tz

In [13]: now = datetime.datetime.now()

In [14]: df1 = pd.DataFrame({'ts': pd.to_datetime([now])})

In [15]: df2 = pd.DataFrame({'ts': pd.to_datetime([now])})

In [16]: pd.merge(df1, df2, on='ts')
Out[16]: 
                          ts
0 2015-11-01 18:33:59.771962

with tz

In [8]: now = datetime.datetime.now().replace(tzinfo=pytz.utc)

In [9]: df3 = pd.DataFrame({'ts': pd.to_datetime([now])})

In [10]: df4 = pd.DataFrame({'ts': pd.to_datetime([now])})

In [11]: pd.merge(df3, df4, on='ts')
Out[11]: 
                                ts
0 2015-11-01 18:32:46.801009+00:00

Upvotes: 2

Related Questions