Reputation: 69329
I want to generate every distinct combination of splitting the lower-case alphabet into two sets of six letters and two sets of seven letters. The order of the letters within the sets doesn't matter, i.e. if two solutions differ only in the order of the letters within the sets, then those solutions are identical.
I.e. these two solutions are identical:
[a,b,c,d,e,f][g,h,i,j,k,l][m,n,o,p,q,r,s][t,u,v,w,x,y,z] [f,b,c,d,e,a][l,h,i,j,k,g][s,n,o,p,q,r,m][z,u,v,w,x,y,t]
A naive approach might be to generate every permutation of the 26 letters plus 2 dummies, split these evenly into four groups and discard duplication solutions (and ignore the dummies when I use the data). But this seems rather inefficient. I'm sure there's a well-known algorithm out there to do this, but I'm struggling to search for this given the wide range of similar but different permutation/combination problems out there.
Is there an existing, named algorithm that can split nk elements into n sets of k elements, generating every combination of those sets? If not, I'll have to hack something together myself. But this feels like a problem that's already been solved.
Upvotes: 3
Views: 1352
Reputation: 156459
I don't know of any algorithm names for this (though one probably exists), but the approach I mentioned in the comments avoids dealing with duplicates and is as efficient as I imagine you can get.
It seems like you could get some improvement by turning the problem on its head: Each letter has to go into one of four buckets, and the buckets have limited space, so recursively try putting each letter into each bucket that has room for it. This way you're only producing combinations rather than permutations.
Here's a C# implementation. It can generate 10,000,000 combinations in under 30 seconds, and 2/3 of that time is spent just in building the string outputs:
void Main()
{
// Tweak these starting values to create smaller subsets if you want.
var letters = Enumerable.Range(0, 26).Select(i => (char)('a' + i)).ToList();
var buckets = new[]{new Bucket(6), new Bucket(6), new Bucket(7), new Bucket(7)};
// I'm only taking 100 values because otherwise this would take a really long time.
var combos = Combos(letters, 0, buckets).Take(100);
foreach (var combo in combos)
{
Console.WriteLine(combo);
}
}
public class Bucket : List<char>
{
public int MaxLoad {get; private set;}
public Bucket(int capacity) : base(capacity)
{
MaxLoad = capacity;
}
}
// Define other methods and classes here
IEnumerable<string> Combos(IList<char> letters, int currentIndex, Bucket[] buckets)
{
if(currentIndex == letters.Count){
yield return string.Join("|", buckets.Select(b => string.Join(",", b)));
yield break;
}
var currentLetter = letters[currentIndex];
foreach (var bucket in buckets)
{
if(bucket.Count < bucket.Capacity)
{
bucket.Add(currentLetter);
foreach (var possibility in Combos(letters, currentIndex + 1, buckets))
{
yield return possibility;
}
bucket.Remove(currentLetter);
}
}
}
Sample output:
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,r,s|t,u,v,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,r,t|s,u,v,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,r,u|s,t,v,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,r,v|s,t,u,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,r,w|s,t,u,v,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,r,x|s,t,u,v,w,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,r,y|s,t,u,v,w,x,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,r,z|s,t,u,v,w,x,y
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,s,t|r,u,v,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,s,u|r,t,v,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,s,v|r,t,u,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,s,w|r,t,u,v,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,s,x|r,t,u,v,w,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,s,y|r,t,u,v,w,x,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,s,z|r,t,u,v,w,x,y
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,t,u|r,s,v,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,t,v|r,s,u,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,t,w|r,s,u,v,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,t,x|r,s,u,v,w,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,t,y|r,s,u,v,w,x,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,t,z|r,s,u,v,w,x,y
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,u,v|r,s,t,w,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,u,w|r,s,t,v,x,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,u,x|r,s,t,v,w,y,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,u,y|r,s,t,v,w,x,z
a,b,c,d,e,f|g,h,i,j,k,l|m,n,o,p,q,u,z|r,s,t,v,w,x,y
...
One nice feature of the approach I've given is that you can process the results as they are produced--you don't have to wait for the whole list to be generated, and you don't need to have all the combinations in memory at the same time.
But be aware that you're going to end up with a lot of combinations--quite possibly more than the computer can generate in any reasonable amount of time regardless of algorithmic efficiency. If Vincent's estimate of 10^12 is correct, for example, it would take roughly a year using the code above. You might be able to optimize it down to a month or so. Parallelization might take it down to a week on a really strong computer.
Upvotes: 4
Reputation: 737
This is a recursion problem.
If i wanted to find the list of all sets of length n containing some letter the easiest way to think about this is the list of all sets of length n-1 that do not contain this letter concatonated with the set of [letter] for each one, to avoid duplicates you discard all elements you have previously done
For example if i wanted to find the number of two letter combinations in the set [A-F] the answer is to take each element and find the combinations of that. So say i want to find all the combinations that contain A that would then be [A][B-F] then say i want to find all the combinations that contain B but not A to continue this could be [B][C-F] Doing this for all the letters a-f will give you all possible combinations of two letters A-F so now that combination becomes your list tail for the three letter combinations.
You would add a to all the two letter combinations that do not contain A, then you would add b to all the two letter combinations that do not contain a or b, and continue this to get all the three letter combinations.
You can continue this algorithm to have as many levels as you want, and it will find the combinations of all elements of a set of a given length
I know you aren't looking for code, but here is a c# implementation
public IList<string> Combos(IList<string> elements, int level)
{
if (level == 1)
{
return elements;
}
var combinations = new List<string>();
var previousCombos = Combos(elements, level - 1);
for (var i = 0; i < elements.Count; i++)
{
previousCombos.ToList().ForEach(item =>
{
if (!elements.Take(i+1).Any(item.Contains))
{
combinations.Add(item + elements[i]);
}
});
}
return combinations;
}
Just a word of warning this is incredibly inefficient, in fact i think it is an exponential algorithm, so don't use it on large data sets or sizes, or your computation will take forever.
Upvotes: 0