BillyJean
BillyJean

Reputation: 1577

Find number of matching rows between dataframes in a list

I have a list containing N dataframes. For this question we look at N=3 for simplicity,

asd<-list()

asd[[1]]<-data.frame("one"=c(1:3), "two"=c("a","b","c"))
asd[[2]]<-data.frame("one"=c(3:5), "two"=c("c","b","a"))
asd[[3]]<-data.frame("one"=c(5:7), "two"=c("a","b","c"))

I'd like to compare these dataframes to each other and get a N x N matrix out whose entries (i,j) tell me how many rows are identical between data frame i and j.

So, for the above we get a 3x3 matrix with elements (i,j)

(1,1)=(2,2)=(3,3) 3 (3 rows are identical)
(1,2)=(2,1)       1 (1 row is identical)
(1,3)=(3,1)       0 (0 rows are identical)
(2,3)=(3,2)       1 (0 row is identical)

Which function can I use for this in R?

Upvotes: 2

Views: 80

Answers (2)

alexis_laz
alexis_laz

Reputation: 13122

The "list" structure might be more convenient to be replaced with a "data.frame":

asd2 = cbind(do.call(rbind, asd), 
             df = rep(seq_along(asd), sapply(asd, nrow)))

If each row of the asd "data.frame"s could be mapped to a single number, then the problem could be made simpler:

r = tapply(1:nrow(asd2), asd2[c("one", "two")])

And, then, following the crossprod(table()) approach (or a variation with a sparse matrix):

library(Matrix)
crossprod(xtabs( ~ r + df, asd2, sparse = TRUE))
#3 x 3 sparse Matrix of class "dsCMatrix"
#  1 2 3
#1 3 1 .
#2 1 3 1
#3 . 1 3

Upvotes: 2

akrun
akrun

Reputation: 886938

Perhaps this helps

sapply(seq_along(asd), function(i) 
   sapply(seq_along(asd), function(j) sum(rowSums(asd[[i]]==asd[[j]])==2)))

Or it could be

sapply(seq_along(asd), function(i) 
    sapply(seq_along(asd), function(j) sum(duplicated(rbind(asd[[i]], asd[[j]])))))

Or another option is

`dim<-`(unlist(do.call(Map, c(f= function(...) 
        sum(duplicated(rbind(...))), expand.grid(asd, asd)))), c(3, 3))

Upvotes: 3

Related Questions