Suamere
Suamere

Reputation: 6248

OutOfMemory Exception for List of POCOs

Given this code:

public class Customer
{
    public int CustomerID { get; set; }
    public string Name { get; set; }
    public List<Qualification> Qualifications { get; set; }
}

public class Qualification
{
    public QualificationType QualificationType { get; set; }
    public decimal Value { get; set; }
}

public class Action
{
    public ActionID { get; set; }
    public int CustomerID { get; set; }
    public decimal ActionValue { get; set; }
}

public class Service : IService
{
    public List<Customer> ProcessCustomers()
    {
        List<Customer> customers = _customerService.GetCustomers(); // 250,000 Customers
        List<Action> actions = _actionService.GetActions(); // 6,000

        foreach (var action in actions) {
            foreach (affectedCustomer in customers.Where(x => x.CustomerID < action.CustomerID)) {
                affectedCustomer.Qualifications.Add(new Qualification { QualificationType = QualificationType.Normal, Value = action.ActionValue});
            }

            foreach (affectedCustomer in customers.Where (x => SpecialRules(x))) {
                affectedCustomer.Qualifications.Add(new Qualification { QualificationType = QualificationType.Special, Value = action.ActionValue});
            }
        }
    }
}

The "Most Qualified" Customer may end up with 12,000 Qualifications. On average, customers may end up with ~100 qualifications.

But I get an OOME very early on, after about 50 actions are processed. At that point, my List still only has 250,000 Customers in it, but there has been about 5,000,000 qualifications added throughout the Customers.

Is that a lot? Seems a bit underwhelming to me. I suspected I could have tens of millions of Customers, and each one have an average of 1000 Qualifications, and still be fine. I'm not even close to that.

What can I do, in code, to make this more efficient? I realize I can write the results of each (or bulk-groups) of Actions to a database, but I'd rather do as much in memory as possible before writing the results.


What this does is cycle through the 6,000 Actions and, for each action, adds qualifications for some variable number of Customers. For each action, all customers with a customerID >= the Action-Causing customer will have a Qualification added. So that is ~1.2 Billion added records. Also, for each action, 8-10 customers receive a Qualification. A tiny 60,000 records compared to the 1.2 billion.

I was trying to do this in memory because I don't want to be doing billions of record inserts into a DB. I WILL need this record separation for the next step of processing, which looks at the customer qualifications and the differences in steps of CustomerIDs from top to bottom. Though in the end, I end up putting results (more complex than SUMs) in the database. But I can only achieve those results by looking at steps of differences in individual qualifications, like grading on a curve.

Upvotes: 0

Views: 70

Answers (2)

Suamere
Suamere

Reputation: 6248

I've been preaching the importance of SOLID Code and an explicit domain model for a long time. I haven't been forced to write domain logic where you have to take hundreds of thousands of data points into account in a couple years. This is what I have found regarding .NET OOME:

  1. A collection of objects is not a collection of pointers to objects. A collection is, itself, the sum of its parts.
  2. For 32 bit applications, an app can use ~2GiB. So even if you split large collections into smaller sets of collections, you won't be able to look at large sets of data.
  3. Objects don't have static addresses. .Net is free to move objects around unless you make your code unsafe and force the objects to be sticky. But even if you do, individual objects are still subject to the ~2GiB max size (that's fine), and the app is still subject to ~2GiB max memory. So creating a collection of pointers is not an option.
  4. Web Applications (Web API and ASP.Net) cannot use the IMAGE_FILE_LARGE_ADDRESS_AWARE flag, or run as 64 big applications easily from what I can tell, I'd like to hear otherwise.

The unfortunate solution

I am required to break my domain model and do some hacks. For example: Instead of a list of Qualifications which I can freely calculate on and sum, I have to have a Customer class like so:

public class Customer
{
    public int CustomerID { get; set; }
    public string Name { get; set; }
    public decimal QualificationType1WithVariableType1Total { get; set; }
    public decimal QualificationType1WithVariableType2Total { get; set; }
    public decimal QualificationType2WithVariableType1Total { get; set; }
    public decimal QualificationType2WithVariableType2Total { get; set; }
}

Effectively doing all of the calculations up front and, if I ever introduce other variables, I'll have to have a "Total" variable to work with. Doing this means; Instead of adding thousands of records to a Customer, that Customer has only half dozen pre-calculated fields with meaning that I can then use in calculations later.

So I am able to lessen my memory footprint, but I am no longer able to use my domain explicitly and do calculations freely while observing a large set of results.

Granted, those properties technically already existed anyway. Some were Readonly and performed a LINQ special equation based on counts, averages, and sums. Some were Read/Write based on progression of other customers within 100 CustomerIDs up or down a linear chain. But instead, I have to throw away all of the context and work with totals only.

I'm just upset that, in this day and age, I have to break my contextual domain model to work in the constraints of hardware. The speed of my app was very fast and scaled near O(1) already, so speed wasn't an issue.

Upvotes: 0

Damian
Damian

Reputation: 2852

The number of objects you are downloading is really huge - you should consider processing the data in smaller chunks rather than downloading all at once.

In .NET there is a limit of memory for single object - you are never allowed to create a single object that exceeds 2 GiB. It has been lifted on 64 bit for .NET 4.5 for arrays.

A list is storing data in an array. If you are downloading all your data to one list the size of the underlying array is above the limit and you have the OutOfMemory exception.

Upvotes: 1

Related Questions