Reputation: 249
I am trying to find similar users using jaccard similarity.
I want to change original df like result df.
The result df column value is the intersection/union.
for example..
The similarity between User 1 and User 2 is 1/2.
1/2 = The number of items both users have in common/ Total number of items both users have
In this way, I want to create an result df that calculates the similarity of all users.
What should I do?
Upvotes: 0
Views: 1298
Reputation: 388982
Write a function which calculates number of items both user have in common divide by the total number of items.
calc <- function(x, y) {
sum(x == 'Y' & y == 'Y')/sum(x == 'Y' | y == 'Y')
}
Split the data rowwise and use outer
:
tmp <- asplit(df, 1)
outer(tmp, tmp, Vectorize(calc))
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1.000 0.5 0.0 0.333 0.4
#[2,] 0.500 1.0 0.0 0.000 0.2
#[3,] 0.000 0.0 1.0 0.000 0.4
#[4,] 0.333 0.0 0.0 1.000 0.4
#[5,] 0.400 0.2 0.4 0.400 1.0
data
It would be helpful if you provide data in a reproducible format instead of an image.
df <- data.frame(item1 = c('Y', 'Y', 'N', 'N', 'Y'),
item2 = c('Y', 'N', 'N', 'Y', 'Y'),
item3 = c('N', 'N', 'Y', 'N', 'Y'),
item4 = c('N', 'N', 'Y', 'N', 'Y'),
item5 = c('N', 'N', 'N', 'Y', 'Y'))
Upvotes: 2