Barka
Barka

Reputation: 8922

why am I getting a System.OutOfMemoryException

I am using Linq to Sql. Here is the code:

Dictionary<string, int> allResults;
using (var dc= new MyDataContext())
{
    dc.CommandTimeout = 0;
    allResults = dc.MyTable.ToDictionary(x => x.Text, x => x.Id);
}

it is ran on a 64 bit machine and the compilation is AnyCPU. It throws a System.OutOfMemoryException.

This accesses an SQL Server database. The Id field maps to an SQL Server int field, And the Text field maps to Text(nvarchar(max)) field. Running select COUNT(*) from TableName results in 1,173,623 records and running select sum(len(Text)) from TableName results in 48,915,031. Since int is a 32 bit integer, the ids should take only 4.69MB of space and the strings less than 1GB. So we are not even bumping against the 2GB/object limit.

I then change the code in this way:

Dictionary<string, int> allResults;
using (var dc = new MyDataContext())
{
   Dictionary<string, int> threeHundred;
   dc.CommandTimeout = 0;
   var tot = dc.MyTable.Count();
   allResults = new Dictionary<string, int>(tot);
   int skip = 0;
   int takeThis = 300000;
   while (skip < tot)
   {
      threeHundred = dc.MyTable.Skip(skip).Take(takeThis).ToDictionary(x => x.Text, x => x.Id);
      skip = skip + takeThis;
      allResults = allResults.Concat(threeHundred).ToDictionary(x => x.Key, x => x.Value);
      threeHundred = null;
      GC.Collect();
    }
}

I learn that garbage collaction here does not help and that the out of memory exception is thrown on the first line in the while loop once skip = 900,000.

What is wrong and how do I fix this?

Upvotes: 2

Views: 1708

Answers (1)

Ofir Makmal
Ofir Makmal

Reputation: 501

Without getting into your calculations of how much it should take in memory (as there could be issues of encoding that could easily double the size of the data), I'll try to give a few pointers.

Starting with the cause of the issue - my guess is that the threeHundred dictionary is causing a lot of allocations. When you add items to a dictionary like above, the dictionary won't be able to know how many items it should pre-allocated. Which will cause a massive re-allocation and coping of all data to newly created dictionaries. Please set a size (using the ctor) to the threeHundred dictionary before adding any items to it.

Please read this article I've published which goes in-depth into Dictionary internals - I'm sure it will shed some light on those symptoms. http://www.codeproject.com/Articles/500644/Understanding-Generic-Dictionary-in-depth

In addition, when trying to populate this large amount of data, I suggest to fully control the process. My suggestion:

  • Pre-allocate slots in the Dictionary (using a Count query directly on the DB, and passing it to the Dictionary ctor)
  • Work with DataReader for populating those items without loading all of the query result into memory. If you know for a fact (which is VERY important to know this in advance) - think of using string.Intern - only if there are many duplicated items! - you should test to see how it is working
  • Memory-profile the code - you should only see ONE allocation for the Dictionary, and strings as the amount of the items from the query (int is value type - therefor it is not allocated on the heap as an object, but instead it sits inside the Dictionary.

Either way, you should check if you are running on 32 bit or 64 bit. .Net 4.5 prefers 32 bit. (check it on Task Manager or the project properties)

Hope this helps, Ofir.

Upvotes: 1

Related Questions