Lee Tom
Lee Tom

Reputation: 93

Element wise comparison between two dataframes (one of them containing lists)

I have two two Dataframes and one is made up of list.

    in [00]: table01
    out[00]: 
       a  b
    0  1  2
    1  2  3

    in [01]: table02
    out[01]: 
          a     b
    0   [2]   [3]
    1 [1,2] [1,2]

And now I want to compare two tables. If the element in table01 also in the same position list of table02, return True otherwise return False. So the table I want to have is:

          a     b
    0 False False  
    1  True False

I have tried table01 in table02 but get a error message: 'DataFrame' objects are mutable, thus they cannot be hashed.

Please share the correct solution of this problem with me. Thanks a lot!

Upvotes: 0

Views: 2295

Answers (3)

BENY
BENY

Reputation: 323226

Try this

df=pd.melt(df1.reset_index(),'index')
df['v2']=pd.melt(df2.reset_index(),'index').value
pd.melt(df2.reset_index(),'index')
df['BOOL']=df.apply(lambda x: True if x.value in x.v2 else False, axis = 1)
df.pivot('index','variable','BOOL')

Out[491]: 
variable      a      b
index                 
0         False  False
1          True  False

Finally :

df1.apply(lambda x: [(x==df2.loc[y,x.name])[y] for y in x.index])
Out[668]: 
       a      b
0  False  False
1   True  False

Upvotes: 1

cs95
cs95

Reputation: 402253

Using sets and df.applymap:

df3 = df1.applymap(lambda x: {x})
df4 = df2.applymap(set)

df3 & df4    
     a   b
0   {}  {}
1  {2}  {}

(df3 & df4).astype(bool)    
       a      b
0  False  False
1   True  False

user3847943's solution is a good alternative, but could be improved using a set membership test.

def find_in_array(a, b):
    return a in b

for c in df2.columns:
    df2[c] = df2[c].map(set)

vfunc = np.vectorize(find_in_array)

df = pd.DataFrame(vfunc(df1, df2), index=df1.index, columns=df1.columns)
df

       a      b
0  False  False
1   True  False

Upvotes: 4

dimithriavindra
dimithriavindra

Reputation: 31

You can easily do this by using numpy.vectorize. Sample code as below.

import numpy as np 
import pandas as pd 

t1 = pd.DataFrame([[1, 2],[2,3]])
t2 = pd.DataFrame([[[2],[3]],[[1,2],[1,2]]])

def find_in_array(a, b):
    return a in b

vfunc = np.vectorize(find_in_array)

print(vfunc(t1, t2))

Upvotes: 1

Related Questions