CMu
CMu

Reputation: 25

Calculating the similarity of logical vectors

I have two logical vectors and I want to measure how close (similar) the TRUE values are. So for example if we have these two vectors:

df<- data.frame(c(T,F,F,F,T,T,F,T),c(F,T,F,T,F,T,F,T))

And I tried this:

sum((df[1]&df[2])==T)
[1] 2

But the problem is that I only have the number of TRUE that are at the same place, but I'd like to know how close they are and compare different vectors with this method. I know for numerical vectors there are ways to do that (euclidian distance for example) but I didn't find any equivalent for logical vectors.

EDIT: It is important that the position of the values changes the similarity between the two vectors, for example in this dataframe:

  [,1] [,2] [,3] [,4]
a    1    0    0    0
b    0    1    0    0
c    0    0    0    1

The similarity between vector a and b should be greater than between b and c

Upvotes: 2

Views: 881

Answers (1)

knytt
knytt

Reputation: 593

ade4 package has a convenient function dist.binary() to calculate various distances/indices for binary data (think of the TRUE/FALSE as of 1/0). You might want to look up details about simple matching coefficient or jaccard index, here is a paper dealing with similarity measures on categorical data.

For instance similarity using Simple matching coefficient:

names(df) <- c("a", "b")
df <- t(as.matrix(sapply(df, as.numeric)))

ade4::dist.binary(df, method = 2L)
          a
b 0.7071068

Upvotes: 2

Related Questions