Reputation: 17799
What is the most efficient way to remove multiple items from an IList<T>
object. Suppose I have an IEnumerable<T>
of all the items I want to remove, in the same order of occurrence that in the original list.
The only way I have in mind is:
IList<T> items;
IEnumerable<T> itemsToDelete;
...
foreach (var x in itemsToDelete)
{
items.Remove(x);
}
But I guess it's not efficient, because it has to go over the list from the beggining every time the method Remove
is called.
Upvotes: 14
Views: 20877
Reputation: 43535
This problem would be easier to solve if there was available an extension method RemoveAll
for the IList<T>
interface. So here is one:
/// <summary>
/// Removes all the elements that match the conditions defined by the
/// specified predicate.
/// </summary>
public static int RemoveAll<T>(this IList<T> list, Func<T, int, bool> predicate)
{
ArgumentNullException.ThrowIfNull(list);
ArgumentNullException.ThrowIfNull(predicate);
int i = 0, j = 0;
try
{
for (; i < list.Count; i++)
{
if (predicate(list[i], i)) continue;
if (j < i) list[j] = list[i];
j++;
}
}
finally
{
if (j < i)
{
for (; i < list.Count; i++, j++)
list[j] = list[i];
while (list.Count > j)
list.RemoveAt(list.Count - 1);
}
}
return i - j;
}
This is a modified version of a custom List<T>.RemoveAll
implementation that is found in this answer. Because of the absence of the RemoveRange
method in the IList<T>
interface, the rightmost leftover slots in the IList<T>
are cleared with repeated removals of the last element. This should be a pretty fast operation in most IList<T>
implementations.
Now the original problem of removing multiple items from a IList<T>
can be solved efficiently like this:
IList<T> items;
IEnumerable<T> itemsToDelete;
//...
HashSet<T> itemsToDeleteSet = new(itemsToDelete);
items.RemoveAll((x, _) => itemsToDeleteSet.Contains(x));
Upvotes: 0
Reputation: 39085
As the number of items to remove gets larger, you will probably find traversing the list and checking each item against a hashset of "items to remove" is more efficient. An extension method like this might help:
static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove)
{
var set = new HashSet<T>(itemsToRemove);
var list = iList as List<T>;
if (list == null)
{
int i = 0;
while (i < iList.Count)
{
if (set.Contains(iList[i])) iList.RemoveAt(i);
else i++;
}
}
else
{
list.RemoveAll(set.Contains);
}
}
I benchmarked using this little program below. (Note that it uses an optimized path if IList<T>
is actually a List<T>
.)
On my machine (and using my test data), this extention method took 1.5 seconds to execute vs 17 seconds for the code in your question. However, I have not tested with different sizes of data. I'm sure for removing just a couple of items RemoveAll2
will be faster.
static class Program
{
static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove)
{
var set = new HashSet<T>(itemsToRemove);
var list = iList as List<T>;
if (list == null)
{
int i = 0;
while (i < iList.Count)
{
if (set.Contains(iList[i])) iList.RemoveAt(i);
else i++;
}
}
else
{
list.RemoveAll(set.Contains);
}
}
static void RemoveAll2<T>(this IList<T> list, IEnumerable<T> itemsToRemove)
{
foreach (var item in itemsToRemove)
list.Remove(item);
}
static void Main(string[] args)
{
var list = Enumerable.Range(0, 10000).ToList();
var toRemove = new[] { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101,
103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167,
173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239,
241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313,
317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397,
401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467,
479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569,
571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643,
647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733,
739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823,
827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911,
919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997};
list.RemoveAll(toRemove); // JIT
//list.RemoveAll2(toRemove); // JIT
var sw = Stopwatch.StartNew();
for (int i = 0; i < 10000; i++)
{
list.RemoveAll(toRemove);
//list.RemoveAll2(toRemove);
}
sw.Stop();
Console.WriteLine("Elapsed: {0} ms", sw.ElapsedMilliseconds);
Console.ReadKey();
}
}
UPDATE (for @KarmaEDV's & Mark Sowul's comments below): If you need to use a custom equality comparer, the extension method could have an overload that takes such a comparer:
public static void RemoveAll<T>(this IList<T> iList, IEnumerable<T> itemsToRemove, IEqualityComparer<T> comparer = null)
{
var set = new HashSet<T>(itemsToRemove, comparer ?? EqualityComparer<T>.Default);
if (iList is List<T>)
{
list.RemoveAll(set.Contains);
}
else
{
int i = iList.Count - 1;
while (i > -1)
{
if (set.Contains(iList[i])) iList.RemoveAt(i);
else i--;
}
}
}
Upvotes: 13
Reputation: 1387
Maybe this helps. Other ideas of the same type could be included.
IList<T> items;
IEnumerable<T> itemsToDelete;
...
{
if(items.Equals(itemsToDelete)) //Equal lists?
{
items.Clear();
return true;
}
if( (double) items.Count/itemsToDelete.Count < 1){
/* It is faster to iterate the small list first. */
foreach (var x in items)
{
if(itemsToDelete.Contains(x)){/**/}
}
}
else{
foreach (var x in itemsToDelete)
{
items.Remove(x);
}
}
}
Upvotes: 1
Reputation: 81169
If the IList<T>
reference happens to refer to an instance of List<T>
, casting to that type and using RemoveAll
is apt to yield better performance than any other approach that doesn't rely upon the particulars of its implementation.
Otherwise, while the optimal approach will depend upon the relative fraction of items that are going to be removed and the nature of the IList<T>
, I would suggest that your best bet might be to copy the IList<T>
to a new List<T>
, clear it, and selectively re-add items. Even if the items in the list are not conducive to efficient hashing, the fact that the items in the IEnumerable<T>
are in the same sequence as those in the IList<T>
would render that irrelevant. Start by reading an item from the IEnumerable<T>
. Then copy items from the array to the list until that one is found. Then read the next item from the IEnumerable<T>
and copy from the array to the list until that one is found, etc. Once the IEnumerable<T>
is exhausted, copy the balance of the array to the List<T>
.
This approach will be fast with many implementations of IList<T>
. It has one major disadvantage, though: the fact that it deletes and re-adds each item might have unwanted side-effects on things like observable lists. If a list might be observable, one may have to use a much slower N^2 algorithm to ensure correctness. [BTW, it irks me that IList<T>
has a Remove(T)
method but lacks a much more useful RemoveAll(Func<T,bool>)
method. The Remove(T)
is largely redundant with IndexOf
and RemoveAt
, while RemoveAll
would allow O(N) implementations of many operations that are O(N^2) in its absence if one isn't allowed to remove and re-add items.
Upvotes: 7