PK Stroh
PK Stroh

Reputation: 75

Compute overlap between columns of two DataFrames as a square co-occurrence matrix

I am looking for the overlap between two dataframes, column by column.

df1 = pd.DataFrame({'V1':['a', 'b', 'c'], 'V2':['d', 'e','f'],'V3':['g','h','i'})
df2 = pd.DataFrame({'X1':['e', 'b', 'd'], 'X2':['a', 'h','i'],'X3':['c','f','g'})

Logic:

with one row per V and Xs as columns.

Expected result:

    X1  X2  X3
V1   1   1   1
V2   2   0   1
V3   0   2   1

I have tried a couple of variations of intersection trying to iterate over columns. Seems like wrong path.

Upvotes: 1

Views: 106

Answers (1)

cs95
cs95

Reputation: 402263

You can do this with an outer equality comparison with NumPy:

pd.DataFrame(np.equal.outer(df1, df2).sum(axis=(0, 2)), 
             index=df1.columns, 
             columns=df2.columns)

    X1  X2  X3
V1   1   1   1
V2   2   0   1
V3   0   2   1

Upvotes: 2

Related Questions