Reputation: 23
I have a list of data frames, each having rows with a 3-dimensional vector (3 columns). I would like to compute the cosine similarity (lsa::cosine) on each subsequent pair of rows in each data frame (e.g., rows 1 and 2, 2 and 3, 3 and 4, etc.). How can I loop through each data frame in the list to calculate the cosine similarities of subsequent rows, keeping the cosine values separate for each data frame?
Here is some easy fake data for reproducibility purposes:
df1 = data.frame(y1 = c(1,2,3,4,5), y2 = c(2,3,4,5,6), y3 = c(5,4,3,2,1))
df2 = data.frame(y1 = c(6,7,8,9,10), y2 = c(6,5,4,3,2), y3 = c(1,3,5,7,9))
dflist = list(df1, df2)
Thanks in advance!
Upvotes: 2
Views: 176
Reputation: 93938
If your data.frames/matrices aren't big, you could t
ranspose each one, calculate the similarity between each row and then subset the returned matrix's first off-diagonal to only compare subsequent rows:
library(lsa)
lapply(dflist, \(x) {
m <- cosine(as.matrix(t(x)))
m[(col(m)-row(m)) == 1]
})
#[[1]]
#[1] 0.9492889 0.9553946 0.9714890 0.9844672
#
#[[2]]
#[1] 0.9635201 0.9747824 0.9850197 0.9915254
Upvotes: 1
Reputation: 887951
We may use lapply/sapply
library(lsa)
sapply(dflist, function(x) mapply(function(u, v)
c(cosine(as.vector(u), as.vector(v))),
asplit(x[-nrow(x), ], 1), asplit(x[-1, ], 1)))
[,1] [,2]
1 0.9492889 0.9635201
2 0.9553946 0.9747824
3 0.9714890 0.9850197
4 0.9844672 0.9915254
Upvotes: 1