Reputation: 37
i have two dataframes and one of the columns in both contains a set . now if both sets are equal i want to merge the dataframes based on that column.
i tried using df.merge as
les=df_report_2.merge(df_report_1,how='inner',on='saltids')
df_report_1 and df_report_2 are the dataframe and saltids is thecolumn which consists of a set datatype.
i get the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-36-c8ca93f2c1e8> in <module>()
----> 1 les=df_report_2.merge(df_report_1,how='inner',on='saltids')
/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
6866 right_on=right_on, left_index=left_index,
6867 right_index=right_index, sort=sort, suffixes=suffixes,
-> 6868 copy=copy, indicator=indicator, validate=validate)
6869
6870 def round(self, decimals=0, *args, **kwargs):
/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
46 copy=copy, indicator=indicator,
47 validate=validate)
---> 48 return op.get_result()
49
50
/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.pyc in get_result(self)
544 self.left, self.right)
545
--> 546 join_index, left_indexer, right_indexer = self._get_join_info()
547
548 ldata, rdata = self.left._data, self.right._data
/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.pyc in _get_join_info(self)
754 else:
755 (left_indexer,
--> 756 right_indexer) = self._get_join_indexers()
757
758 if self.right_index:
/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.pyc in _get_join_indexers(self)
733 self.right_join_keys,
734 sort=self.sort,
--> 735 how=self.how)
736
737 def _get_join_info(self):
/usr/local/lib/python2.7/dist-packages/pandas/core/reshape/merge.pyc in _get_join_indexers(left_keys, right_keys, sort, how, **kwargs)
1128
1129 # get left & right join labels and num. of levels at each location
-> 1130 llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys)))
1131
1132 # get flat i8 keys from label lists
TypeError: type object argument after * must be an iterable, not itertools.imap
Upvotes: 1
Views: 284
Reputation: 51
As a workaround you can probably join on hashed values of the columns.
Since sets cannot be hashed, you need to convert to frozen set and then calculate hash.
Code would look like
df_report_1['saltids_hashed'] = df_report_1['saltids'].map(lambda x: hash(frozenset(x)))
df_report_2['saltids_hashed'] = df_report_2['saltids'].map(lambda x: hash(frozenset(x)))
les=df_report_2.merge(df_report_1,how='inner',on='saltids_hashed')
You can obviously drop the hashed column in the final output.
Upvotes: 1