niukasu
niukasu

Reputation: 287

Merging dataframes with unhashable columns

I want to merge two Pandas DataFrames. If the item code (e.g. A,B,C,D) are the same, their attributes a, b must be the same, but b is a numpy array or a list that is unhashable.

Foo:

item   a     b              
A      1     [2,0] 
B      1     [3,0]         
C      0     [4,0]         

Bar:

item   a     b
A      1     [2,0]
D      0     [6,1]

This is what I want

code   a     b        Foo   Bar
A      1     [2,0]    1     1
B      1     [3,0]    1     0
C      0     [4,0]    1     0
D      0     [6,1]    0     1

Upvotes: 6

Views: 19947

Answers (4)

piRSquared
piRSquared

Reputation: 294318

cols = ['a', 'b', 'item']
pd.concat([Foo, Bar], keys=['Foo', 'Bar']) \
    .assign(c=1).pipe(lambda d: d.assign(b=d.b.apply(tuple))) \
    .set_index(cols, append=True) \
    .c.unstack(0, fill_value=0).reset_index(cols) \
    .pipe(lambda d: d.assign(b=d.b.apply(list)))

   a       b item  Bar  Foo
0  1  [2, 0]    A    1    1
1  0  [6, 1]    D    1    0
1  1  [3, 0]    B    0    1
2  0  [4, 0]    C    0    1

Upvotes: 2

BENY
BENY

Reputation: 323276

Or you can try this

foo.b = foo.b.apply(tuple)
bar.b = bar.b.apply(tuple)
df=pd.concat([foo,bar],axis=0).drop_duplicates()
df['foo']=df.isin(foo).a.astype(int)
df['bar']=df.isin(bar).a.astype(int)
df.b=df.b.apply(list)
df
Out[60]: 
   a       b item  foo  bar
0  1  [2, 0]    A    1    1
1  1  [3, 0]    B    1    0
2  0  [4, 0]    C    1    0
1  0  [6, 1]    D    0    1

Upvotes: 3

unutbu
unutbu

Reputation: 879691

Here is a way to merge without converting the unhashables to tuples.

Since the item code has a 1-to-1 correspondence with the values in the a and b columns, it suffices to merge on item alone. Since the values in the item column are hashable, there is no problem merging:

import pandas as pd

foo = pd.DataFrame({'item': list('ABC'), 'a':[1,1,0], 'b':[[2,0], [3,0], [4,0]]})
bar = pd.DataFrame({'item': list('AD'), 'a':[1,0], 'b':[[2,0], [6,1]]})

result = pd.merge(foo.assign(Foo=1), bar.assign(Bar=1), on='item', how='outer',
                  suffixes=['', '_y'])
for col in ['a','b']:
    result[col].update(result[col+'_y'])

for col in ['Foo', 'Bar']:
    result[col] = result[col].fillna(0)
result = result.drop(['a_y', 'b_y'], axis=1)
print(result)

yields

     a       b item  Foo  Bar
0  1.0  [2, 0]    A  1.0  1.0
1  1.0  [3, 0]    B  1.0  0.0
2  0.0  [4, 0]    C  1.0  0.0
3  0.0  [6, 1]    D  0.0  1.0

There is a bit of touch-up work needed after the merge, however. Since we only merge on item, result gets two columns of a and b -- the ones from bar are called a_y, and b_y. The update method is used to fill in NaN values from a with corresponding values from a_y, and then the same is also done for b.

The clever idea to use foo.assign(Foo=1), bar.assign(Bar=1) to obtain the Foo and Bar columns was taken from cᴏʟᴅsᴘᴇᴇᴅ's solution.

Upvotes: 6

cs95
cs95

Reputation: 402553

You could use df.merge and df.fillna:

out = foo.assign(Foo=1).merge(bar.assign(Bar=1), 'outer').fillna(0)
print(out)

  item  a       b  Foo  Bar
0    A  1  (2, 0)  1.0  1.0
1    B  1  (3, 0)  1.0  0.0
2    C  0  (4, 0)  1.0  0.0
3    D  0  (6, 1)  0.0  1.0

If b is a list type, you could convert it to a tuple first and then merge.

foo.b = foo.b.apply(tuple)
bar.b = bar.b.apply(tuple)
out = foo.assign(Foo=1).merge(bar.assign(Bar=1), 'outer').fillna(0)
out.b = out.b.apply(list)

print(out)

  item  a       b  Foo  Bar
0    A  1  [2, 0]  1.0  1.0
1    B  1  [3, 0]  1.0  0.0
2    C  0  [4, 0]  1.0  0.0
3    D  0  [6, 1]  0.0  1.0

Upvotes: 9

Related Questions