Reputation: 17799

Most efficient way to remove multiple items from a IList<T>

What is the most efficient way to remove multiple items from an IList<T> object. Suppose I have an IEnumerable<T> of all the items I want to remove, in the same order of occurrence that in the original list.

The only way I have in mind is:

IList<T> items;
IEnumerable<T> itemsToDelete;
...

foreach (var x in itemsToDelete)
{
    items.Remove(x);
}

But I guess it's not efficient, because it has to go over the list from the beggining every time the method Remove is called.

Upvotes: 14

Answers (4)

Theodor Zoulias

Reputation: 43535

This problem would be easier to solve if there was available an extension method RemoveAll for the IList<T> interface. So here is one:

/// <summary>
/// Removes all the elements that match the conditions defined by the
/// specified predicate.
/// </summary>
public static int RemoveAll<T>(this IList<T> list, Func<T, int, bool> predicate)
{
    ArgumentNullException.ThrowIfNull(list);
    ArgumentNullException.ThrowIfNull(predicate);

    int i = 0, j = 0;
    try
    {
        for (; i < list.Count; i++)
        {
            if (predicate(list[i], i)) continue;
            if (j < i) list[j] = list[i];
            j++;
        }
    }
    finally
    {
        if (j < i)
        {
            for (; i < list.Count; i++, j++)
                list[j] = list[i];
            while (list.Count > j)
                list.RemoveAt(list.Count - 1);
        }
    }
    return i - j;
}

This is a modified version of a custom List<T>.RemoveAll implementation that is found in this answer. Because of the absence of the RemoveRange method in the IList<T> interface, the rightmost leftover slots in the IList<T> are cleared with repeated removals of the last element. This should be a pretty fast operation in most IList<T> implementations.

Now the original problem of removing multiple items from a IList<T> can be solved efficiently like this:

IList<T> items;
IEnumerable<T> itemsToDelete;
//...

HashSet<T> itemsToDeleteSet = new(itemsToDelete);
items.RemoveAll((x, _) => itemsToDeleteSet.Contains(x));

Online demo.

Upvotes: 0

Eren Ersönmez

Reputation: 39085

As the number of items to remove gets larger, you will probably find traversing the list and checking each item against a hashset of "items to remove" is more efficient. An extension method like this might help:

static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove)
{
    var set = new HashSet<T>(itemsToRemove);

    var list = iList as List<T>;
    if (list == null)
    {
        int i = 0;
        while (i < iList.Count)
        {
            if (set.Contains(iList[i])) iList.RemoveAt(i);
            else i++;
        }
    }
    else
    {
        list.RemoveAll(set.Contains);
    }
}

I benchmarked using this little program below. (Note that it uses an optimized path if IList<T> is actually a List<T>.)

On my machine (and using my test data), this extention method took 1.5 seconds to execute vs 17 seconds for the code in your question. However, I have not tested with different sizes of data. I'm sure for removing just a couple of items RemoveAll2 will be faster.

static class Program
{
    static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove)
    {
        var set = new HashSet<T>(itemsToRemove);

        var list = iList as List<T>;
        if (list == null)
        {
            int i = 0;
            while (i < iList.Count)
            {
                if (set.Contains(iList[i])) iList.RemoveAt(i);
                else i++;
            }
        }
        else
        {
            list.RemoveAll(set.Contains);
        }
    }

    static void RemoveAll2<T>(this IList<T> list, IEnumerable<T> itemsToRemove)
    {
        foreach (var item in itemsToRemove)
            list.Remove(item);
    }

    static void Main(string[] args)
    {
        var list = Enumerable.Range(0, 10000).ToList();
        var toRemove = new[] { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 
                              43,  47,  53,  59,  61,  67,  71,  73,  79,  83,  89,  97, 101,
                             103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167,
                             173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239,
                             241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313,
                             317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397,
                             401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467,
                             479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569,
                             571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643,
                             647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733,
                             739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823,
                             827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911,
                             919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997};
        list.RemoveAll(toRemove); // JIT 
        //list.RemoveAll2(toRemove); // JIT 

        var sw = Stopwatch.StartNew();
        for (int i = 0; i < 10000; i++)
        {
            list.RemoveAll(toRemove);
            //list.RemoveAll2(toRemove);
        }
        sw.Stop();
        Console.WriteLine("Elapsed: {0} ms", sw.ElapsedMilliseconds);
        Console.ReadKey();
    }
}

UPDATE (for @KarmaEDV's & Mark Sowul's comments below): If you need to use a custom equality comparer, the extension method could have an overload that takes such a comparer:

public static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove, IEqualityComparer<T> comparer = null)
{
    var set = new HashSet<T>(itemsToRemove, comparer ?? EqualityComparer<T>.Default);

    if (iList is List<T>)
    {
        list.RemoveAll(set.Contains);
    }
    else
    {
        int i = iList.Count - 1;
        while (i > -1)
        {
            if (set.Contains(iList[i])) iList.RemoveAt(i);
            else i--;
        }
    }
}

Upvotes: 13

celerno

Reputation: 1387

Maybe this helps. Other ideas of the same type could be included.

IList<T> items;

IEnumerable<T> itemsToDelete;
...
{
   if(items.Equals(itemsToDelete)) //Equal lists?
     {
      items.Clear(); 
      return true;
     }


   if(  (double) items.Count/itemsToDelete.Count < 1){
      /* It is faster to iterate the small list first. */ 
              foreach (var x in items)
              {
                if(itemsToDelete.Contains(x)){/**/} 

              }
    }
   else{
           foreach (var x in itemsToDelete)
              {
               items.Remove(x);
              }
   }
}

Upvotes: 1

supercat

Reputation: 81169

If the IList<T> reference happens to refer to an instance of List<T>, casting to that type and using RemoveAll is apt to yield better performance than any other approach that doesn't rely upon the particulars of its implementation.

Otherwise, while the optimal approach will depend upon the relative fraction of items that are going to be removed and the nature of the IList<T>, I would suggest that your best bet might be to copy the IList<T> to a new List<T>, clear it, and selectively re-add items. Even if the items in the list are not conducive to efficient hashing, the fact that the items in the IEnumerable<T> are in the same sequence as those in the IList<T> would render that irrelevant. Start by reading an item from the IEnumerable<T>. Then copy items from the array to the list until that one is found. Then read the next item from the IEnumerable<T> and copy from the array to the list until that one is found, etc. Once the IEnumerable<T> is exhausted, copy the balance of the array to the List<T>.

This approach will be fast with many implementations of IList<T>. It has one major disadvantage, though: the fact that it deletes and re-adds each item might have unwanted side-effects on things like observable lists. If a list might be observable, one may have to use a much slower N^2 algorithm to ensure correctness. [BTW, it irks me that IList<T> has a Remove(T) method but lacks a much more useful RemoveAll(Func<T,bool>) method. The Remove(T) is largely redundant with IndexOf and RemoveAt, while RemoveAll would allow O(N) implementations of many operations that are O(N^2) in its absence if one isn't allowed to remove and re-add items.

Upvotes: 7

Most efficient way to remove multiple items from a IList&lt;T&gt;

Answers (4)

Related Questions

Most efficient way to remove multiple items from a IList<T>