Reputation: 212
I have 3 vectors containing more than 1500 character string elements which I want to pairwise compare for outputting the number of matching elements between them.
A reduced tibble of my data:
va <- c("6a460daf68eb0410b51d79e495fbccc7", "e1b32017108e17e41bdabc44bac4df3c", "6ac1327da92d8584008db04b4eaf62d0", "b01a2322e2ca99315646d79cf157cb20", "12dadc27059ea5d3c8cc54e9a28cc4f6", "be73c9685b743a646f2eb0480eee2f8d")
vb <- c("6a460daf68eb0410b51d79e495fbccc7", "e1b32017108e17e41bdabc44bac4df3c","JQ183785.1.1345", "DQ794886.1.1390", "HQ791014.1.1450", "EU764755.1.1328")
vc <- c("6a460daf68eb0410b51d79e495fbccc7", "JQ183785.1.1345", "DQ794886.1.1390", "HQ791014.1.1450", "b01a2322e2ca99315646d79cf157cb20", "EF532786.1.1364")
I have made a function for outputting the number of coincident elements between two vectors:
sharing <- function(v1, v2, share=TRUE){
if(isTRUE(share)){sh <- length(v1[ v1 %in% v2])}
else if (isFALSE(share)){sh <- length(v1[ ! v1 %in% v2])}
return(sh)
}
So, applying this function 9 times (one for each pairwise comparison including self-comparison), I would be able to get 9 numbers with shared elements:
> sharing(va,va); sharing(va,vb); sharing(va,vc)
[1] 6
[1] 2
[1] 2
> sharing(vb,va); sharing(vb,vb); sharing(vb,vc)
[1] 2
[1] 6
[1] 4
> sharing(vc,va); sharing(vc,vb); sharing(vc,vc)
[1] 2
[1] 4
[1] 6
But I would like to get this as a matrix:
va vb vc
va 6 2 2
vb 2 6 4
vc 2 4 6
Is there any premade function or code which can make this?
Thanks for the help!
Upvotes: 2
Views: 401
Reputation: 887068
One option is outer
to apply the sharing
function on pairwise combination of vector
s in a list
('lst1')
lst1 <- mget(paste0("v", letters[1:3])) # placed the vectors in a list
out <- outer(lst1, lst1, FUN = Vectorize(sharing)) #apply the sharing
dimnames(out) <- list(names(lst1), names(lst1)) # set the dim names
Upvotes: 1