Reputation: 1577
I have a list containing N
dataframes. For this question we look at N=3
for simplicity,
asd<-list()
asd[[1]]<-data.frame("one"=c(1:3), "two"=c("a","b","c"))
asd[[2]]<-data.frame("one"=c(3:5), "two"=c("c","b","a"))
asd[[3]]<-data.frame("one"=c(5:7), "two"=c("a","b","c"))
I'd like to compare these dataframes to each other and get a N x N
matrix out whose entries (i,j)
tell me how many rows are identical between data frame i
and j
.
So, for the above we get a 3x3 matrix with elements (i,j)
(1,1)=(2,2)=(3,3) 3 (3 rows are identical)
(1,2)=(2,1) 1 (1 row is identical)
(1,3)=(3,1) 0 (0 rows are identical)
(2,3)=(3,2) 1 (0 row is identical)
Which function can I use for this in R?
Upvotes: 2
Views: 80
Reputation: 13122
The "list" structure might be more convenient to be replaced with a "data.frame":
asd2 = cbind(do.call(rbind, asd),
df = rep(seq_along(asd), sapply(asd, nrow)))
If each row of the asd
"data.frame"s could be mapped to a single number, then the problem could be made simpler:
r = tapply(1:nrow(asd2), asd2[c("one", "two")])
And, then, following the crossprod(table())
approach (or a variation with a sparse matrix):
library(Matrix)
crossprod(xtabs( ~ r + df, asd2, sparse = TRUE))
#3 x 3 sparse Matrix of class "dsCMatrix"
# 1 2 3
#1 3 1 .
#2 1 3 1
#3 . 1 3
Upvotes: 2
Reputation: 886938
Perhaps this helps
sapply(seq_along(asd), function(i)
sapply(seq_along(asd), function(j) sum(rowSums(asd[[i]]==asd[[j]])==2)))
Or it could be
sapply(seq_along(asd), function(i)
sapply(seq_along(asd), function(j) sum(duplicated(rbind(asd[[i]], asd[[j]])))))
Or another option is
`dim<-`(unlist(do.call(Map, c(f= function(...)
sum(duplicated(rbind(...))), expand.grid(asd, asd)))), c(3, 3))
Upvotes: 3