celialayla
celialayla

Reputation: 23

Apply function to list of data frames in R

I have a list of data frames, each having rows with a 3-dimensional vector (3 columns). I would like to compute the cosine similarity (lsa::cosine) on each subsequent pair of rows in each data frame (e.g., rows 1 and 2, 2 and 3, 3 and 4, etc.). How can I loop through each data frame in the list to calculate the cosine similarities of subsequent rows, keeping the cosine values separate for each data frame?

Here is some easy fake data for reproducibility purposes:

df1 = data.frame(y1 = c(1,2,3,4,5), y2 = c(2,3,4,5,6), y3 = c(5,4,3,2,1))
df2 = data.frame(y1 = c(6,7,8,9,10), y2 = c(6,5,4,3,2), y3 = c(1,3,5,7,9))
dflist = list(df1, df2)

Thanks in advance!

Upvotes: 2

Views: 176

Answers (2)

thelatemail
thelatemail

Reputation: 93938

If your data.frames/matrices aren't big, you could transpose each one, calculate the similarity between each row and then subset the returned matrix's first off-diagonal to only compare subsequent rows:

library(lsa)
lapply(dflist, \(x) {
  m <- cosine(as.matrix(t(x)))
  m[(col(m)-row(m)) == 1]
})
#[[1]]
#[1] 0.9492889 0.9553946 0.9714890 0.9844672
#
#[[2]]
#[1] 0.9635201 0.9747824 0.9850197 0.9915254

Upvotes: 1

akrun
akrun

Reputation: 887951

We may use lapply/sapply

library(lsa)
sapply(dflist, function(x) mapply(function(u, v)
   c(cosine(as.vector(u), as.vector(v))), 
   asplit(x[-nrow(x), ], 1), asplit(x[-1, ], 1)))
       [,1]      [,2]
1 0.9492889 0.9635201
2 0.9553946 0.9747824
3 0.9714890 0.9850197
4 0.9844672 0.9915254

Upvotes: 1

Related Questions