sapbucket
sapbucket

Reputation: 7215

How to generate all combinations with replacement using recursive yield? C#

The goal here is to generate all combinations with replacement using recursion such that it does not exceed RAM. A yield operator is designed for this. I want to use a yield operator because if I do not I will exceed available RAM. The number of elements I am combining causes there to be many billions of combinations.

I determined how to generate all combinations without replacement using recursive yield. Here are the examples that I wrote:

public static IEnumerable<int[]> GetCombinationsWithYield(this int[] elements, int length)
{
   return Combinations2(elements, length, 0, new int[length]);
}

private static IEnumerable<int[]> Combinations2(int[] input, int len, int startPosition, int[] result)
{
    if (len == 0)
    {
        yield return result;
    }
    else
    {
        for (int i = startPosition; i <= input.Length - len; i++)
        {
            result[result.Length - len] = input[i];

            //  need to return the results of the recursive call
            foreach (var combination in Combinations2(input, len - 1, i + 1, result))
            {
                yield return combination;
            }
        }
    }
}

And you can test this using the following unit test:

[Test]
public void CombinationsUsingArraysOnlyWithYield()
{
    // use this method when RAM consumption is a concern.

    int[] items = {1, 2, 3};

    var results = new int[3][];
    for (int i = 0; i < results.Length; i++)
        results[i] = new int[2];

    int index = 0;

    var stopwatch = new Stopwatch();
    stopwatch.Start();

    // i only copy the results in to an array so that I don't benchmark Console.WriteLine stuff.
    // for this to be truly useful, you would not want to copy the results.
    foreach (var result in items.GetCombinationsWithYield(2))
    {
        Array.Copy(result, results[index], 2);
        index++;
    }

    stopwatch.Stop();

    for (int i = 0; i < results.GetLength(0); i++)
    {
        string output = "";
        for (int j = 0; j < results[i].Length; j++)
            output += results[i][j] + ",";
        Console.WriteLine(output);
    }

    Console.WriteLine("elapsed: " + stopwatch.ElapsedTicks + "[ticks]");
}

The output is:

1,2,
1,3,
2,3,
elapsed: 56597[ticks]

but as you can see, that example is without replacement.

On the other hand, I would like to use this with replacement, so that the output looks like this:

1,1,
1,2,
1,3,
2,1,
2,2,
2,3,
3,1,
3,2,
3,3,

Which I was able to achieve using a Linq solution. But it does not utilize the yield operator and overflows my RAM. Here is that solution:

public static List<List<T>> GetCombinations<T>(this IEnumerable<T> pool, int comboLength, bool isWithReplacement)
{
    if (isWithReplacement)
        return GetCombinations(pool, comboLength).Select(c => c.ToList()).ToList();

}

private static IEnumerable<IEnumerable<T>> GetCombinations<T>(IEnumerable<T> list, int length)
{
    if (length == 1) return list.Select(t => new[] {t});

    return GetCombinations(list, length - 1).SelectMany(t => list, (t1, t2) => t1.Concat(new[] {t2}));
}

Any assistance would be greatly appreciated. Someone with intimate knowledge of Knuth algorithms would be ideal.

Upvotes: 0

Views: 1750

Answers (2)

Servy
Servy

Reputation: 203812

The LINQ operations you're using are implemented internally using iterator blocks. You're essentially asking for a solution that inlines those operations into the solution. That is going to cause the exact same problem as your current solution. It results in you creating a ton of expensive state machines, which are all being discarded with very little use. To avoid the extremely high memory footprints you need to avoid creating so many state machines in the first place. Writing a recursive iterator block isn't going to accomplish that. Writing an iterative, rather than recursive, solution (whether it's an iterator block or not), would be a way to accomplish that.

An iterative solution is quite simple, and has a memory footprint that's constant. You simply need to compute the number of combinations that there are, and then, for each of them, compute the combination with that unique index (which is simple enough to do).

private static IEnumerable<IEnumerable<T>> GetCombinations<T>(IList<T> list, int length)
{
    var numberOfCombinations = (long)Math.Pow(list.Count, length);
    for(long i = 0; i < numberOfCombinations; i++)
    {
        yield return BuildCombination(list, length, i);
    }
}
private static IEnumerable<T> BuildCombination<T>(
    IList<T> list, 
    int length, 
    long combinationNumber)
{
    var temp = combinationNumber;
    for(int j = 0; j < length; j++)
    {
        yield return list[(int)(temp % list.Count)];
        temp /= list.Count;
    }
}

Upvotes: 2

Dave Greilach
Dave Greilach

Reputation: 895

I think you pretty much have the solution already. I made a couple of small modifications to your code

public static IEnumerable<List<T>> GetCombinations<T>(IEnumerable<T> pool, int comboLength, 
bool isWithReplacement) // changed this to return an enumerable
{
     foreach (var list in GetCombinations(pool, comboLength).Select(c => c.ToList()))
     {
             yield return list; // added a yield return of the list instead of returning a to list of the results
     }
}

private static IEnumerable<IEnumerable<T>> GetCombinations<T>(IEnumerable<T> list, int length)
{
    if (length == 1) return list.Select(t => new[] { t });

    return GetCombinations(list, length - 1).SelectMany(t => list, (t1, t2) => t1.Concat(new[] { t2 }));
}

I tested with this:

    List<int> items = new List<int>();   

    for (int i = 1; i < 100; i++)
    {
        items.Add(i);
    }
    Stopwatch s = new Stopwatch();
    s.Start();
    int count = 0;
    foreach (var list in GetCombinations(items, 4))
    {
        count++;
    }
    s.Stop();
    Console.WriteLine(count);
    Console.WriteLine(s.ElapsedMilliseconds);
    Console.ReadLine();

This ran fine with no memory issues in 7587 ms and generated 96,059,601 combinations.

Upvotes: 0

Related Questions