What is the best way to compute a similarity matrix for a dataframe of binary vectors?

Question

I have an data frame of size m x n of binary vectors with some unfilled values like the below sample

col1 col2 col3 col4 col5
 V0    1         0    1
 V1    1    1         0
 V2    0    1    0    1
 V3         0    0

I would like to compute a similarity matrix on this data frame such that I get a similarity score between any 2 vectors.

What is the best way to do this?

Note: I attempted replacing the NULL values with 2 and applied cosine similarity from the scipy library on the dataframe. The result matrix was not accurate/correct.

What is the best way to compute a similarity matrix for a dataframe of binary vectors?

Answers (1)

Related Questions