Reputation: 389
I was wondering, is there an elegant way to remove multiple items from a generic collection (in my case, a List<T>
) without doing something such as specifying a predicate in a LINQ query to find the items to delete?
I'm doing a bit of batch processing, in which I'm filling a List<T>
with Record
object types that need to be processed. This processing concludes with each object being inserted into a database. Instead of building the list, and then looping through each individual member and processing/inserting it, I want to perform transactional bulk inserts with groups of N
items from the list because it's less resource intensive (where N
represents the BatchSize
that I can put in a config file, or equivalent).
I'm looking to do something like:
public void ProcessRecords()
{
// list of Records will be a collection of List<Record>
var listOfRecords = GetListOfRecordsFromDb( _connectionString );
var batchSize = Convert.ToInt32( ConfigurationManager.AppSettings["BatchSize"] );
do
{
var recordSubset = listOfRecords.Take(batchSize);
DoProcessingStuffThatHappensBeforeInsert( recordSubset );
InsertBatchOfRecords( recordSubset );
// now I want to remove the objects added to recordSubset from the original list
// the size of listOfRecords afterwards should be listOfRecords.Count - batchSize
} while( listOfRecords.Any() )
}
I'm looking for a way to do this all at once, instead of iterating through the subset and removing the items that way, such as:
foreach(Record rec in recordSubset)
{
if( listOfRecords.Contains(rec) )
{
listOfRecords.Remove(rec);
}
}
I was looking at using List.RemoveRange( batchSize )
, but wanted to get some StackOverflow feedback first :) What methods do you use to maximize the efficiency of your batch processing algorithms in C#?
Any help/suggestions/hints are much appreciated!
Upvotes: 1
Views: 4380
Reputation: 236228
With extension method
public static IEnumerable<List<T>> ToBatches<T>(this List<T> list, int batchSize)
{
int index = 0;
List<T> batch = new List<T>(batchSize);
foreach (T item in list)
{
batch.Add(item);
index++;
if (index == batchSize)
{
index = 0;
yield return batch;
batch = new List<T>(batchSize);
}
}
yield return batch;
}
You can split input sequence into batches:
foreach(var batch in listOfRecords.ToBatches(batchSize))
{
DoProcessingStuffThatHappensBeforeInsert(batch);
InsertBatchOfRecords(batch);
}
Upvotes: 3
Reputation: 50114
MoreLINQ has a Batch
extension method that would allow you to call
var listOfRecords = GetListOfRecordsFromDb( _connectionString );
var batchSize = Convert.ToInt32( ConfigurationManager.AppSettings["BatchSize"] );
foreach(var batch in listOfRecords.Batch(batchSize))
{
DoProcessingStuffThatHappensBeforeInsert(batch);
InsertBatchOfRecords(batch);
}
You wouldn't need to bother taking stuff out of the listOfRecords
.
Upvotes: 1