swiftgp
swiftgp

Reputation: 1015

Dictionary with initial capacity

I read that initializing a dictionary with initial capacity may lead to better performance if the size could be estimated.

Dim MyDataTable As New DataTable

'Fill MyDataTable from Database

Dim CapacityEstimate As integer = MyDataTable.Rows.Count
Dim MyDictionary As New Dictionary(Of String, MyObjectType)(CapacityEstimate)

'Fill the Dictionary independent of data table

The CapacityEstimate variable is just an estimate (generally in the range of 2500 to 7000) of the number of Key/Value pairs that the dictionary should hold. So if I estimate it to be 4000 and end up with 4010 objects (I may go over or under, not certain) would the dictionary use a lot of memory to resize with those many key/value pairs already in it. Is this a good solution or am I better off not initializing it with initial capacity. Thanks.

EDIT: Related but not the same - Should a .NET generic dictionary be initialised with a capacity equal to the number of items it will contain?

Upvotes: 1

Views: 3541

Answers (4)

Gerardo Recinto
Gerardo Recinto

Reputation: 21

I know this may be late, but in any case, this may still be of value to anybody who would read this. The reason why it is better to set the Capacity to some known value is to prevent re-allocation. In a highly busy, 24x7 service/app, where memory utilization is comprehensive/extreme condition, you would want to avoid adding extra pressure by preventing memory re-allocation by setting Capacity to a known size, or some average/estimated size.

Memory re-allocation in this case can generate "(small) holes" in the memory space, which can lead to memory fragmentation. There will be a time that even if memory is still huge, because of too much fragmentation, then your app may experience "out of memory" conditions.

This observation was true up to .Net 4.5.1 I believe is when I last tested/observed this. If the newer framework version has better garbage collector where memory compaction is being done in decent frequency, thus, alleviating fragmentation issue or making this a minor thing, then it will not matter much.

Upvotes: 1

Hans Passant
Hans Passant

Reputation: 941218

Don't worry about the small stuff. A dictionary like that does not use a lot of memory so having it resize itself can't take a lot of memory either. The real storage is the objects for the key and data, the dictionary only contains references to them. 8 bytes per entry in 32-bit mode so that's only 4000 x 8 + some overhead = 32 kilobytes.

Furthermore, the capacity you pass is used to calculate the number of hash buckets in the dictionary. Which is always a prime number equal or larger than the capacity you specified. The prime numbers are picked from this array (copied from the Reference Source):

    internal static readonly int[] primes = {
        3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 239, 293, 353, 431, 521, 631, 761, 919,
        1103, 1327, 1597, 1931, 2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 14591,
        17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 75431, 90523, 108631, 130363, 156437,
        187751, 225307, 270371, 324449, 389357, 467237, 560689, 672827, 807403, 968897, 1162687, 1395263,
        1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 4999559, 5999471, 7199369};

So if you pass 4000 then you'll actually get 4049 buckets, the next largest prime. So overshooting to 4010 isn't going to make a difference. If it does need to resize then it doubles the capacity. So a single resize would already produce 8419 buckets, well past you max estimate. Resizing isn't very expensive either, couple of microseconds. Which is why Andre couldn't see a difference.

Which is, other than reasoning about it, the proper approach. Measure. Anybody can measure.

Upvotes: 4

Andre Calil
Andre Calil

Reputation: 7692

That's a good question. I hadn't searched it around, but Oded's answer seems good to me.

However, let's run a conceptual micro-benchmark on it:

        Dictionary<string, int> FixedCapacity = new Dictionary<string, int>(4000);
        Dictionary<string, int> NotFixedCapacity = new Dictionary<string, int>();

        Stopwatch stopWatch = new Stopwatch();

        stopWatch.Start();

        for (int i = 0; i < 5000; i++)
        {
            FixedCapacity.Add(i.ToString(), i);
        }

        stopWatch.Stop();

        Console.WriteLine(string.Format("Fixed initial capacity: {0} ms", stopWatch.ElapsedMilliseconds));

        stopWatch.Reset();

        stopWatch.Start();

        for (int i = 0; i < 5000; i++)
        {
            NotFixedCapacity.Add(i.ToString(), i);
        }

        stopWatch.Stop();

        Console.WriteLine(string.Format("Not Fixed initial capacity: {0} ms", stopWatch.ElapsedMilliseconds));

        Console.ReadLine();

Results:

Fixed initial capacity: 1ms
Not Fixed initial capacity: 1ms

That's another good answer, IMO =)

Disclaimer: no, this is not a full benchmark procedure, I'm just measuring framework's "default" behavior, on a single machine. I've manually run it several times and got the same results, even though it's not in a loop. If you have a better benchmark tool, kindly test it and paste the results here.

Upvotes: 5

Oded
Oded

Reputation: 498904

would the dictionary use a lot of memory to resize with those many key/value pairs already in it

The dictionary will only "resize" if you go over the capacity estimated.

Memory will be reserved for the number of items you have estimated - this will happen in the constructor of the Dictionary.

Now, there is a difference between the capacity and the actual size. The capacity is how many items the dictionary can hold without the internal storage being resized. The size is the number of items actually stored in the dictionary (i.e. those added to it).

Upvotes: 1

Related Questions