Reputation: 53
not sure if I chose a good title at all... Also I don't know if I use the right terminology, so maybe with the right search terms I would find a solution to this problem...
I have a list of strings from which I want to have all sets of "exclusive" combinations of 3.
Example: With the following
require(utils)
mylist<-c("strA","strB","strC","strD","strE","strF")
t(combn(mylist,3))
I get a table that lists all possible combinations of 3 out of those 6 strings (so each row represents one combination of 3):
[,1] [,2] [,3]
[1,] "strA" "strB" "strC"
[2,] "strA" "strB" "strD"
[3,] "strA" "strB" "strE"
[4,] "strA" "strB" "strF"
[5,] "strA" "strC" "strD"
[6,] "strA" "strC" "strE"
[7,] "strA" "strC" "strF"
[8,] "strA" "strD" "strE"
[9,] "strA" "strD" "strF"
[10,] "strA" "strE" "strF"
[11,] "strB" "strC" "strD"
[12,] "strB" "strC" "strE"
[13,] "strB" "strC" "strF"
[14,] "strB" "strD" "strE"
[15,] "strB" "strD" "strF"
[16,] "strB" "strE" "strF"
[17,] "strC" "strD" "strE"
[18,] "strC" "strD" "strF"
[19,] "strC" "strE" "strF"
[20,] "strD" "strE" "strF"
but I would like to have all sets of combinations of 3, in which each string only appears once. So my desired output would look sth like this:
$1
[,1] [,2] [,3]
[1,] "strA" "strB" "strC"
[2,] "strD" "srtE" "strF"
$2
[,1] [,2] [,3]
[1,] "strA" "strB" "strD"
[1,] "strC" "strE" "strF"
$3
[,1] [,2] [,3]
[1,] "strA" "strB" "strE"
[1,] "strC" "strD" "strF"
...
So here each subelement ($1
, $2
, $3
, etc.) contains 2 combinations of 3 strings (as 2*3=6; with 6 strings). In each set each string must not appear more than once.
Of course it would be great if this would be also possible for lengths of mylist
that are not a multiple of n=3
. In case we have let's say 10 strings (with the addition of "strG", "strH", "strI", and "strJ"), I would like to have one string ommitted in each combination. So the desired result would be sth like
$1
[,1] [,2] [,3]
[1,] "strA" "strB" "strC"
[2,] "strD" "strE" "strF"
[3,] "strG" "strH" "strI"
$2
[,1] [,2] [,3]
[1,] "strA" "strB" "strC"
[2,] "strD" "strE" "strF"
[3,] "strG" "strH" "strJ"
$3
[,1] [,2] [,3]
[1,] "strA" "strB" "strC"
[2,] "strD" "strE" "strF"
[3,] "strG" "strI" "strJ"
$4
[,1] [,2] [,3]
[1,] "strA" "strB" "strC"
[2,] "strE" "strF" "strG"
[3,] "strH" "strI" "strJ"
...
Does someone have a solution to this? If my explanation was unclear, please let me know.
Cheers
Upvotes: 2
Views: 132
Reputation: 53
Based on 42's help (thanks again!) I figured out a way which is anything but elegant, but does the job (slowly...). But only because I could eliminate some of the possible combinations before doing the following steps it was feasible this way. In my original problem I have 49 strings, which would result in extremely big vectors, so be careful when applying the following steps to more than, say, 15 strings. There is surely a way to calculate how many combinations have to be processed...
Here comes the complete example
require(utils)
mylist<-paste("str",LETTERS[1:10],sep="")
mat<-as.data.frame(t(combn(mylist, 3, simplify = TRUE)))
mat[] <- lapply(mat, as.character)
mat.subset<-list()
for (i in seq(nrow(mat)))
{
mat.temp<-mat
j=1
mat.subset[[i]]<-mat[i,]
rem.row<-sort(unique(c(which(mat.temp[,1]%in%mat[i,1:3]),which(mat.temp[,2]%in%mat[i,1:3]),which(mat.temp[,3]%in%mat[i,1:3]))))
mat.temp<-mat.temp[-rem.row,]
while (j<=nrow(mat.temp))
{
if(!length(intersect(mat.temp[j,1:3],unlist(mat.subset[[i]]))))
{
mat.subset[[i]]<-rbind(mat.subset[[i]],mat.temp[j,])
rem.row<-sort(unique(c(which(mat.temp[,1]%in%mat.temp[j,1:3]),which(mat.temp[,2]%in%mat[i,1:3]),which(mat.temp[,3]%in%mat[i,1:3]))))
mat.temp<-mat.temp[-rem.row,]
}
j<-j+1
}
}
mat.subset.lengths<-unlist(lapply(mat.subset,function(x) nrow(x)))
mat.subset<-mat.subset[which(mat.subset.lengths==max(mat.subset.lengths))]
The last two steps were necessary in my case as I had, as stated above, excluded some combinations before the time consuming for
loop, and only a certain number of starting points would yield a complete solution (or in the worst case close to complete solution).
If you have a hint that there are more sets than covered with this procedure or if you have a more elgant way, I would appreciate your input.
Upvotes: 1
Reputation: 263342
Will assume the transposed combo matrix is named mat
. Check to see if there is any overlap with length applied to results of intersect
function:
res <- list();
for (i in 1:nrow(mat) ){
for( j in 1:nrow(mat)){
if( !length(intersect(mat[i,] , mat[j,])) )
res[[paste(i,j,sep="_")]] <- rbind( mat[i,], mat[j, ]) } }
> res
$`1_20`
[,1] [,2] [,3]
[1,] "strA" "strB" "strC"
[2,] "strD" "strE" "strF"
$`2_19`
[,1] [,2] [,3]
[1,] "strA" "strB" "strD"
[2,] "strC" "strE" "strF"
$`3_18`
[,1] [,2] [,3]
[1,] "strA" "strB" "strE"
[2,] "strC" "strD" "strF"
.... snipped
Depending on your definition of "unique" you might decide to only take the first ten items since half of these are transpositions of rows:
> res[[1]]
[,1] [,2] [,3]
[1,] "strA" "strB" "strC"
[2,] "strD" "strE" "strF"
> res[[20]]
[,1] [,2] [,3]
[1,] "strD" "strE" "strF"
[2,] "strA" "strB" "strC"
Upvotes: 1