Reputation: 90475
When I have to get GBs of data, save it on a collection and process it, I have memory overflows. So instead of:
public class Program
{
public IEnumerable<SomeClass> GetObjects()
{
var list = new List<SomeClass>();
while( // get implementation
list.Add(object);
}
return list;
}
public void ProcessObjects(IEnumerable<SomeClass> objects)
{
foreach(var object in objects)
// process implementation
}
void Main()
{
var objects = GetObjects();
ProcessObjects(objects);
}
}
I need to:
public class Program
{
void ProcessObject(SomeClass object)
{
// process implementation
}
public void GetAndProcessObjects()
{
var list = new List<SomeClass>();
while( // get implementation
Process(object);
}
return list;
}
void Main()
{
var objects = GetAndProcessObjects();
}
}
There is a better way?
Upvotes: 4
Views: 518
Reputation: 699
The best methodology in this case would be to Get and Process in chunks. You will have to find out how big a chunk to Get and Process by trial and error. So the code would be something like :
public class Program
{ public IEnumerable GetObjects(int anchor, int chunkSize) { var list = new List(); while( // get implementation for given anchor and chunkSize list.Add(object); } return list; }
public void ProcessObjects(IEnumerable<SomeClass> objects)
{
foreach(var object in objects)
// process implementation
}
void Main()
{
int chunkSize = 5000;
int totalSize = //Get Total Number of rows;
int anchor = //Get first row to process as anchor;
While (anchor < totalSize)
(
var objects = GetObjects(anchor, chunkSize);
ProcessObjects(objects);
anchor += chunkSize;
}
}
}
Upvotes: 1
Reputation: 9511
public IEnumerable<SomeClass> GetObjects()
{
foreach( var obj in GetIQueryableObjects
yield return obj
}
Upvotes: 3
Reputation: 56391
You want to yield!
Delay processing of your enumeration. Build a method that returns an IEnumerable but only returns one record at a time using the yield statement.
Upvotes: 1
Reputation: 351476
You ought to leverage C#'s iterator blocks and use the yield return
statement to do something like this:
public class Program
{
public IEnumerable<SomeClass> GetObjects()
{
while( // get implementation
yield return object;
}
}
public void ProcessObjects(IEnumerable<SomeClass> objects)
{
foreach(var object in objects)
// process implementation
}
void Main()
{
var objects = GetObjects();
ProcessObjects(objects);
}
}
This would allow you to stream each object and not keep the entire sequence in memory - you would only need to keep one object in memory at a time.
Upvotes: 9
Reputation: 161773
Don't use a List, which requires all the data to be present in memory at once. Use IEnumerable<T>
and produce the data on demand, or better, use IQueryable<T>
and have the entire execution of the query deferred until the data are required.
Alternatively, don't keep the data in memory at all, but rather save the data to a database for processing. When processing is complete, then query the database for the results.
Upvotes: 6