Reputation: 193
How do you concatenate huge lists without doubling memory?
Consider the following snippet:
Console.WriteLine($"Initial memory size: {Process.GetCurrentProcess().WorkingSet64 /1024 /1024} MB");
int[] a = Enumerable.Range(0, 1000 * 1024 * 1024 / 4).ToArray();
int[] b = Enumerable.Range(0, 1000 * 1024 * 1024 / 4).ToArray();
Console.WriteLine($"Memory size after lists initialization: {Process.GetCurrentProcess().WorkingSet64 / 1024 / 1024} MB");
List<int> concat = new List<int>();
concat.AddRange(a.Skip(500 * 1024 * 1024 / 4));
concat.AddRange(b.Skip(500 * 1024 * 1024 / 4));
Console.WriteLine($"Memory size after lists concatenation: {Process.GetCurrentProcess().WorkingSet64 / 1024 / 1024} MB");
The output is:
Initial memory size: 12 MB
Memory size after lists initialization: 2014 MB
Memory size after lists concatenation: 4039 MB
I would like to keep memory usage to 2014 MB after concatenation, without modifying a and b.
Upvotes: 7
Views: 1573
Reputation: 32780
They are persistent. I just need to concatenate them, do some searching and then dispose the concatenated list
If you only need to do some searching, why do you need to concatenate in the first place? Search both arrays separately.
It could be the case that what you are searching might bridge both arrays. If that is the case, to make things easier and not pay the memory price, simply implement a wrapper that simulates the operation but doesn't actually perform it:
sealed class Concatenated<T>:
IReadOnlyList<T>
{
public static Concatenated<T>
Concatenate<T>(
IReadOnlyList<T> first,
IReadOnlyList<T> second)
=> new ConcatenatedArray<T>(first, second);
private readonly IReadOnlyList<T>
first, second;
private Concatenated(
IReadOnlyList<T> first,
IReadOnlyList<T> second)
{
this.first = first;
this.second = second;
}
public T this[int index]
=> index < first.Length ?
first[index]:
second[index - first.Length];
public int Count => first.Length + second.Length;
public IEnumerator<T> GetEnumerator()
{
foreach (var f in first)
yield return f;
foreach (var s in second)
yield return s;
}
IEnumerator IEnumerable.GetEnumerator()
=> GetEnumerator();
}
Upvotes: 4
Reputation: 179
As InBetween mentions, you really shouldn't make a new list. I imagine his solution is what is the "best" solution.
In terms of answering your initial question, you're going to have issues due to how Garbage Collection works with .NET (https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/fundamentals).
In order to get around this, the best way will be to not use any of the built in containers to allow yourself to have full control of you memory usage without using any classes and allocating everything to the stack.
Here is some examples handling the allocations to have closer control over memory due to scoping:
void MyFunc(IList<int> combinedList)
{
int[] a = new int[LARGE_COUNT]; // This will initialize to the default value of the type. (default)int == 0
int[] b = new int[LARGE_COUNT];
// Add whatever you want to combinedList. This will just add both.
combinedList.AddRange(a);
combinedList.AddRange(b);
}
The section above will have a and b disposed of immediately due to them being stack allocations without using any classes. This will make proper use of the Garbage Collection difference in structs vs classes.
There is another way to do it a bit more heavy-handedly.
List<int> concat = new List<int>();
using (int[] a = Enumerable.Range(0, 1000 * 1024 * 1024 / 4).ToArray()){
concat.AddRange(a.Skip(500 * 1024 * 1024 / 4));
}
using (int[] b = Enumerable.Range(0, 1000 * 1024 * 1024 / 4).ToArray()){
concat.AddRange(b.Skip(500 * 1024 * 1024 / 4));
}
// Do a GC.Collect() if you really don't want to put this in it's own scope for some reason.
The GC.Collect() is a very aggressive way to get around learning the proper way .NET's garbage collection properly is setup to work.
Upvotes: 3
Reputation: 9526
Use Enumerable.Concat()
. In the source, you can see that ConcatIterator
first yields all items from first
and then from second
. It does not copy the original IEnumerables
(or arrays in this case), it uses references.
(NOTE: for maximum speed and many small IEnumerables
you should not do this, but for minimum memory consumption and a few large IEnumerables
this works)
Upvotes: 3
Reputation: 1503419
If you need a List<int>
, you can't do this. A List<int>
always contains its data directly, so by the time you've got two arrays with (say) 100 elements, and a list which was created by concatenating those two, you've got 400 independent elements. You can't change that.
What you're looking for is a way of not creating an independent copy of the data. If you're just searching through it (as it sounds like in the comments) you can just use an IEnumerable<int>
created with LINQ:
IEnumerable<int> concat = a.Concat(b);
If you needed something like an IReadOnlyList<T>
or even an IList<T>
, you could implement those interfaces yourself to create an adapter over multiple arrays - but you'd probably need to write that yourself. If you can stick with IEnumerable<T>
, using LINQ will be a lot simpler.
Upvotes: 10
Reputation: 3559
I can suggest you performing some optimizations:
Initialize a and b as IEnumerable<int>
without call ToArray() method
int size = 1000 * 1024 * 1024 / 4;
IEnumerable<int> a = Enumerable.Range(0, size);
IEnumerable<int> b = Enumerable.Range(0, size);
Initialize concat with a known capacity
List<int> concat = new List<int>(size);
As the result I get the following output:
Initial memory size: 12 MB
Memory size after lists initialization: 13 MB
Memory size after lists concatenation: 1021 MB
If you just want to search something in concatenation you can do it like this without extra allocations:
IEnumerable<int> concat = a.Skip(500 * 1024 * 1024 / 4).Concat(b.Skip(500 * 1024 * 1024 / 4));
int search = concat.Count(i => i % 2 == 0);
Console.WriteLine($"Search result: {search}");
Upvotes: 5