Reputation: 75
I am looking for the overlap between two dataframes, column by column.
df1 = pd.DataFrame({'V1':['a', 'b', 'c'], 'V2':['d', 'e','f'],'V3':['g','h','i'})
df2 = pd.DataFrame({'X1':['e', 'b', 'd'], 'X2':['a', 'h','i'],'X3':['c','f','g'})
Logic:
with one row per V and Xs as columns.
Expected result:
X1 X2 X3
V1 1 1 1
V2 2 0 1
V3 0 2 1
I have tried a couple of variations of intersection trying to iterate over columns. Seems like wrong path.
Upvotes: 1
Views: 106
Reputation: 402263
You can do this with an outer equality comparison with NumPy:
pd.DataFrame(np.equal.outer(df1, df2).sum(axis=(0, 2)),
index=df1.columns,
columns=df2.columns)
X1 X2 X3
V1 1 1 1
V2 2 0 1
V3 0 2 1
Upvotes: 2