Chris
Chris

Reputation: 193

How to concatenate lists without using extra memory?

How do you concatenate huge lists without doubling memory?

Consider the following snippet:

 Console.WriteLine($"Initial memory size: {Process.GetCurrentProcess().WorkingSet64 /1024 /1024} MB");
 int[] a = Enumerable.Range(0, 1000 * 1024 * 1024 / 4).ToArray();
 int[] b = Enumerable.Range(0, 1000 * 1024 * 1024 / 4).ToArray();
 Console.WriteLine($"Memory size after lists initialization: {Process.GetCurrentProcess().WorkingSet64 / 1024 / 1024} MB");
 List<int> concat = new List<int>();
 concat.AddRange(a.Skip(500 * 1024 * 1024 / 4));
 concat.AddRange(b.Skip(500 * 1024 * 1024 / 4));
 Console.WriteLine($"Memory size after lists concatenation: {Process.GetCurrentProcess().WorkingSet64 / 1024 / 1024} MB");

The output is:

Initial memory size: 12 MB
Memory size after lists initialization: 2014 MB
Memory size after lists concatenation: 4039 MB

I would like to keep memory usage to 2014 MB after concatenation, without modifying a and b.

Upvotes: 7

Views: 1573

Answers (5)

InBetween
InBetween

Reputation: 32780

They are persistent. I just need to concatenate them, do some searching and then dispose the concatenated list

If you only need to do some searching, why do you need to concatenate in the first place? Search both arrays separately.

It could be the case that what you are searching might bridge both arrays. If that is the case, to make things easier and not pay the memory price, simply implement a wrapper that simulates the operation but doesn't actually perform it:

sealed class Concatenated<T>:
    IReadOnlyList<T>
{
    public static Concatenated<T> 
        Concatenate<T>(
            IReadOnlyList<T> first,
            IReadOnlyList<T> second)
        => new ConcatenatedArray<T>(first, second);

    private readonly IReadOnlyList<T>
       first, second;

    private Concatenated(
        IReadOnlyList<T> first,
        IReadOnlyList<T> second)
    {
        this.first = first;
        this.second = second;
    }

    public T this[int index] 
        => index < first.Length ? 
           first[index]: 
           second[index - first.Length];

    public int Count => first.Length + second.Length;

    public IEnumerator<T> GetEnumerator()
    {
        foreach (var f in first)
            yield return f;

        foreach (var s in second)
            yield return s;
    }

    IEnumerator IEnumerable.GetEnumerator()
        => GetEnumerator();
}

Upvotes: 4

Travis Primm
Travis Primm

Reputation: 179

As InBetween mentions, you really shouldn't make a new list. I imagine his solution is what is the "best" solution.

In terms of answering your initial question, you're going to have issues due to how Garbage Collection works with .NET (https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/fundamentals).

In order to get around this, the best way will be to not use any of the built in containers to allow yourself to have full control of you memory usage without using any classes and allocating everything to the stack.

Here is some examples handling the allocations to have closer control over memory due to scoping:

    void MyFunc(IList<int> combinedList)
    {
      int[] a = new int[LARGE_COUNT]; // This will initialize to the default value of the type. (default)int == 0
      int[] b = new int[LARGE_COUNT];

      // Add whatever you want to combinedList. This will just add both.
      combinedList.AddRange(a);
      combinedList.AddRange(b);
    }

The section above will have a and b disposed of immediately due to them being stack allocations without using any classes. This will make proper use of the Garbage Collection difference in structs vs classes.

There is another way to do it a bit more heavy-handedly.

    List<int> concat = new List<int>();
    using (int[] a = Enumerable.Range(0, 1000 * 1024 * 1024 / 4).ToArray()){
        concat.AddRange(a.Skip(500 * 1024 * 1024 / 4));
    }
    using (int[] b = Enumerable.Range(0, 1000 * 1024 * 1024 / 4).ToArray()){
        concat.AddRange(b.Skip(500 * 1024 * 1024 / 4));
    }
    // Do a GC.Collect() if you really don't want to put this in it's own scope for some reason.

The GC.Collect() is a very aggressive way to get around learning the proper way .NET's garbage collection properly is setup to work.

Upvotes: 3

Kasper van den Berg
Kasper van den Berg

Reputation: 9526

Use Enumerable.Concat(). In the source, you can see that ConcatIterator first yields all items from first and then from second. It does not copy the original IEnumerables (or arrays in this case), it uses references.
(NOTE: for maximum speed and many small IEnumerables you should not do this, but for minimum memory consumption and a few large IEnumerables this works)

Upvotes: 3

Jon Skeet
Jon Skeet

Reputation: 1503419

If you need a List<int>, you can't do this. A List<int> always contains its data directly, so by the time you've got two arrays with (say) 100 elements, and a list which was created by concatenating those two, you've got 400 independent elements. You can't change that.

What you're looking for is a way of not creating an independent copy of the data. If you're just searching through it (as it sounds like in the comments) you can just use an IEnumerable<int> created with LINQ:

IEnumerable<int> concat = a.Concat(b);

If you needed something like an IReadOnlyList<T> or even an IList<T>, you could implement those interfaces yourself to create an adapter over multiple arrays - but you'd probably need to write that yourself. If you can stick with IEnumerable<T>, using LINQ will be a lot simpler.

Upvotes: 10

chronoxor
chronoxor

Reputation: 3559

I can suggest you performing some optimizations:

  1. Initialize a and b as IEnumerable<int> without call ToArray() method

    int size = 1000 * 1024 * 1024 / 4;
    IEnumerable<int> a = Enumerable.Range(0, size);
    IEnumerable<int> b = Enumerable.Range(0, size);
    
  2. Initialize concat with a known capacity

    List<int> concat = new List<int>(size);
    

As the result I get the following output:

Initial memory size: 12 MB
Memory size after lists initialization: 13 MB
Memory size after lists concatenation: 1021 MB

If you just want to search something in concatenation you can do it like this without extra allocations:

IEnumerable<int> concat = a.Skip(500 * 1024 * 1024 / 4).Concat(b.Skip(500 * 1024 * 1024 / 4));
int search = concat.Count(i => i % 2 == 0);
Console.WriteLine($"Search result: {search}");

Upvotes: 5

Related Questions