Reputation: 339
I am trying to compute the jaccard similarity between each pair of names in large vectors of names (see below for small example) and to store their jaccard similarity in a matrix. My function is just returning NULL. What am I doing wrong?
library(dplyr)
df = data.frame(matrix(NA, ncol=3, nrow=3))
df = df %>%
mutate_if(is.logical, as.numeric)
names(df) = c("A.J. Doyle", "A.J. Graham", "A.J. Porter")
draft_names = names(df)
row.names(df) = c("A.J. Feeley", "A.J. McCarron", "Aaron Brooks")
quarterback_names = row.names(df)
library(stringdist)
jaccard_similarity = function(d){
for (i in 1:nrow(d)){
for(j in 1:ncol(d)){
d[i,j] = stringdist(quarterback_names[i], draft_names[j], method ='jaccard', q=2)
}
}
}
df = jaccard_similarity(df)
Upvotes: 2
Views: 1221
Reputation: 9124
Reason : There is no explict return.
you can add print and debug like below and trace
jaccard_similarity = function(d){
for (i in 1:nrow(d)){
for(j in 1:ncol(d)){
d[i,j] = stringdist(quarterback_names[i], draft_names[j], method ='jaccard', q=2)
print(d[i,j])
}
}
return(d)
}
Output:
[1] 0.6428571
[1] 0.75
[1] 0.75
[1] 0.7647059
[1] 0.7777778
[1] 0.7777778
[1] 1
[1] 1
[1] 1
You can simply call jaccard_similarity(df)
too get the values.
output <-jaccard_similarity(df)
A.J. Doyle A.J. Graham A.J. Porter
A.J. Feeley 0.6428571 0.7500000 0.7500000
A.J. McCarron 0.7647059 0.7777778 0.7777778
Aaron Brooks 1.0000000 1.0000000 1.0000000
And assign the output to new variable rather overriding existing df
.
Upvotes: 0
Reputation: 66834
You are not returning anything after the for
loops. Use return(d)
at the end of the function.
This problem is also a classic use case for outer
:
outer(quarterback_names,draft_names,FUN=stringdist,method="jaccard",q=2)
[,1] [,2] [,3]
[1,] 0.6428571 0.7500000 0.7500000
[2,] 0.7647059 0.7777778 0.7777778
[3,] 1.0000000 1.0000000 1.0000000
Upvotes: 3
Reputation: 43169
You need to return your changed dataframe:
jaccard_similarity = function(d){
for (i in 1:nrow(d)){
for(j in 1:ncol(d)){
d[i,j] = stringdist(quarterback_names[i], draft_names[j], method ='jaccard', q=2)
}
}
return(d)
// ^^^
}
jaccard_similarity(df)
yields
A.J. Doyle A.J. Graham A.J. Porter
A.J. Feeley 0.6428571 0.7500000 0.7500000
A.J. McCarron 0.7647059 0.7777778 0.7777778
Aaron Brooks 1.0000000 1.0000000 1.0000000
Upvotes: 2