Jader Dias
Jader Dias

Reputation: 90475

Which is the best way to improve memory usage when you collect a large data set before processing it? (.NET)

When I have to get GBs of data, save it on a collection and process it, I have memory overflows. So instead of:

 public class Program
 {
     public IEnumerable<SomeClass> GetObjects()
     {
         var list = new List<SomeClass>();
         while( // get implementation
             list.Add(object);
         }
         return list;
     }

     public void ProcessObjects(IEnumerable<SomeClass> objects)
     {
         foreach(var object in objects)
             // process implementation
     }

     void Main()
     {
         var objects = GetObjects();
         ProcessObjects(objects);
     }
 }

I need to:

 public class Program
 {
     void ProcessObject(SomeClass object)
     {
         // process implementation
     }

     public void GetAndProcessObjects()
     {
         var list = new List<SomeClass>();
         while( // get implementation
             Process(object);
         }
         return list;
     }

     void Main()
     {
         var objects = GetAndProcessObjects();
     }
 }

There is a better way?

Upvotes: 4

Views: 518

Answers (5)

Ralph Wiggum
Ralph Wiggum

Reputation: 699

The best methodology in this case would be to Get and Process in chunks. You will have to find out how big a chunk to Get and Process by trial and error. So the code would be something like :

public class Program

{ public IEnumerable GetObjects(int anchor, int chunkSize) { var list = new List(); while( // get implementation for given anchor and chunkSize list.Add(object); } return list; }

 public void ProcessObjects(IEnumerable<SomeClass> objects)
 {
     foreach(var object in objects)
         // process implementation
 }

 void Main()
 {
     int chunkSize = 5000;
     int totalSize = //Get Total Number of rows;
     int anchor = //Get first row to process as anchor;
     While (anchor < totalSize)
     (
         var objects = GetObjects(anchor, chunkSize);
         ProcessObjects(objects);
         anchor += chunkSize;
     }
 }

}

Upvotes: 1

Rony
Rony

Reputation: 9511

public IEnumerable<SomeClass> GetObjects()
     {

       foreach( var obj in GetIQueryableObjects
             yield return obj
     }

Upvotes: 3

Randolpho
Randolpho

Reputation: 56391

You want to yield!

Delay processing of your enumeration. Build a method that returns an IEnumerable but only returns one record at a time using the yield statement.

Upvotes: 1

Andrew Hare
Andrew Hare

Reputation: 351476

You ought to leverage C#'s iterator blocks and use the yield return statement to do something like this:

 public class Program
 {
     public IEnumerable<SomeClass> GetObjects()
     {
         while( // get implementation
             yield return object;
         }
     }

     public void ProcessObjects(IEnumerable<SomeClass> objects)
     {
         foreach(var object in objects)
             // process implementation
     }

     void Main()
     {
         var objects = GetObjects();
         ProcessObjects(objects);
     }
 }

This would allow you to stream each object and not keep the entire sequence in memory - you would only need to keep one object in memory at a time.

Upvotes: 9

John Saunders
John Saunders

Reputation: 161773

Don't use a List, which requires all the data to be present in memory at once. Use IEnumerable<T> and produce the data on demand, or better, use IQueryable<T> and have the entire execution of the query deferred until the data are required.

Alternatively, don't keep the data in memory at all, but rather save the data to a database for processing. When processing is complete, then query the database for the results.

Upvotes: 6

Related Questions