Eric Stallcup
Eric Stallcup

Reputation: 389

Remove Multiple Elements From List<T>

I was wondering, is there an elegant way to remove multiple items from a generic collection (in my case, a List<T>) without doing something such as specifying a predicate in a LINQ query to find the items to delete?

I'm doing a bit of batch processing, in which I'm filling a List<T> with Record object types that need to be processed. This processing concludes with each object being inserted into a database. Instead of building the list, and then looping through each individual member and processing/inserting it, I want to perform transactional bulk inserts with groups of N items from the list because it's less resource intensive (where N represents the BatchSize that I can put in a config file, or equivalent).

I'm looking to do something like:

public void ProcessRecords()
{
    // list of Records will be a collection of List<Record>
    var listOfRecords = GetListOfRecordsFromDb( _connectionString );
    var batchSize = Convert.ToInt32( ConfigurationManager.AppSettings["BatchSize"] );

    do
    {
       var recordSubset = listOfRecords.Take(batchSize);
       DoProcessingStuffThatHappensBeforeInsert( recordSubset );

       InsertBatchOfRecords( recordSubset );

       // now I want to remove the objects added to recordSubset from the original list
       // the size of listOfRecords afterwards should be listOfRecords.Count - batchSize
    } while( listOfRecords.Any() )
}

I'm looking for a way to do this all at once, instead of iterating through the subset and removing the items that way, such as:

foreach(Record rec in recordSubset)
{
   if( listOfRecords.Contains(rec) ) 
   { 
      listOfRecords.Remove(rec);
   }
}

I was looking at using List.RemoveRange( batchSize ), but wanted to get some StackOverflow feedback first :) What methods do you use to maximize the efficiency of your batch processing algorithms in C#?

Any help/suggestions/hints are much appreciated!

Upvotes: 1

Views: 4380

Answers (2)

Sergey Berezovskiy
Sergey Berezovskiy

Reputation: 236228

With extension method

public static IEnumerable<List<T>> ToBatches<T>(this List<T> list, int batchSize)
{
    int index = 0;
    List<T> batch = new List<T>(batchSize);

    foreach (T item in list)
    {
        batch.Add(item);    
        index++;

        if (index == batchSize)
        {
            index = 0;                
            yield return batch;
            batch = new List<T>(batchSize);
        }
    }

    yield return batch;
}

You can split input sequence into batches:

foreach(var batch in listOfRecords.ToBatches(batchSize))
{
   DoProcessingStuffThatHappensBeforeInsert(batch);
   InsertBatchOfRecords(batch);
}

Upvotes: 3

Rawling
Rawling

Reputation: 50114

MoreLINQ has a Batch extension method that would allow you to call

var listOfRecords = GetListOfRecordsFromDb( _connectionString );
var batchSize = Convert.ToInt32( ConfigurationManager.AppSettings["BatchSize"] );

foreach(var batch in listOfRecords.Batch(batchSize))
{
   DoProcessingStuffThatHappensBeforeInsert(batch);
   InsertBatchOfRecords(batch);
}

You wouldn't need to bother taking stuff out of the listOfRecords.

Upvotes: 1

Related Questions