create a scoring matrix from two dataframes

Question

I am trying to compare sets of variables(X) that are stored in two dataframes (foo, bar). Each X is a unique independent variable that has up to 10 values of Y associated with it. I would like to compare every foo.X with every bar.X by comparing the number of Y values they have in common - so the output could be a matrix with axes of foo.x by bar.x in length.

this simple example of foo and bar would want to return a 2x2 matrix comparing a,b with c,d:

foo <- data.frame(x= c('a', 'a', 'a', 'b', 'b', 'b'), y=c('ab', 'ac', 'ad', 'ae', 'fx', 'fy'))
bar <- data.frame(x= c('c', 'c', 'c', 'd', 'd', 'd'), y=c('ab', 'xy', 'xz', 'xy', 'fx', 'xz'))

EDIT:

I've left the following code for other newbies to learn from (for loops are effectvie but probably very suboptimal), but the two solutions below are effective. In particular Ramnath's use of data.table is very effective when dealing with very large dataframes.

store the dataframes as lists where the values of y are stored using the stack function

foo.list <- dlply(foo, .(x), function(x) stack(x, select = y))
bar.list <- dlply(bar, .(x),function(x)  stack(x, select = y))

write a function for comparing membership in the two stacked lists

comparelists <- function(list1, list2) {
  for (i in list1){ 
    for (j in list2){
      count <- 0
      if (i[[1]] %in% j[[1]]) count <- count + 1
    }
  }
  return count
  }

write an output matrix

output.matrix <- matrix(1:length(foo.list), 1:length(bar.list))
for (i in foo.list){
  for (j in bar.list){
    output.matrix[i,j] <- comparelists(i,j)

    }

}

Ramnath · Accepted Answer

Here is a simpler approach using merge

library(reshape2)
df1 <- merge(foo, bar, by = 'y')
dcast(df1, x.x ~ x.y, length)

  x.x c d
1   a 1 0
2   b 0 1

EDIT. The merge can be faster using data.table. Here is the code

foo_dt <- data.table(foo, key = 'y')
bar_dt <- data.table(bar, key = 'y')
df1 <- bar_dt[foo_dt, nomatch = 0]

create a scoring matrix from two dataframes

Answers (2)

Related Questions