Best way to remove one arraylist elements from another arraylist

What is the best performance method in Java (7,8) to eliminate integer elements of one Arraylist from another. All the elements are unique in the first and second lists.

At the moment I know the API method removeall and use it this way:

tempList.removeAll(tempList2);

The problem appears when I operate with arraylists have more than 10000 elements. For example when I remove 65000 elements, the delay appears to be about 2 seconds. But I need to opperate with even more large lists with more than 1000000 elements.

What is the strategy for this issue?

Maybe something with new Stream API should solve it?

Upvotes: 18

Views: 12263

Answers (3)

Marco13
Marco13

Reputation: 54659

tl;dr:

Keep it simple. Use

list.removeAll(new HashSet<T>(listOfElementsToRemove));

instead.


As Eran already mentioned in his answer: The low performance stems from the fact that the pseudocode of a generic removeAll implementation is

public boolean removeAll(Collection<?> c) {
    for (each element e of this) {
        if (c.contains(e)) {
            this.remove(e);
        }
    }
}

So the contains call that is done on the list of elements to remove will cause the O(n*k) performance (where n is the number of elements to remove, and k is the number of elements in the list that the method is called on).

Naively, one could imagine that the this.remove(e) call on a List might also have O(k), and this implementation would also have quadratic complexity. But this is not the case: You mentioned that the lists are specifically ArrayList instances. And the ArrayList#removeAll method is implemented to delegate to a method called batchRemove that directly operates on the underlying array, and does not remove the elements individually.

So all you have to do is to make sure that the lookup in the collection that contains the elements to remove is fast - preferably O(1). This can be achieved by putting these elements into a Set. In the end, it can just be written as

list.removeAll(new HashSet<T>(listOfElementsToRemove));

Side notes:

The answer by Eran has IMHO two major drawbacks: First of all, it requires sorting the lists, which is O(n*logn) - and it's simply not necessary. But more importantly (and obviously) : Sorting will likely change the order of the elements! What if this is simply not desired?

Remotely related: There are some other subtleties involved in the removeAll implementations. For example, HashSet removeAll method is surprisingly slow in some cases. Although this also boils down to the O(n*n) when the elements to be removed are stored in a list, the exact behavior may indeed be surprising in this particular case.

Upvotes: 21

Eran
Eran

Reputation: 393946

Well, since removeAll checks for each element of tempList whether it appears in tempList2, the running time is proportional to the size of the first list multiplied by the size of the second list, which means O(N^2) unless one of the two lists is very small and can be considered as "constant size".

If, on the other hand, you pre-sort the lists, and then iterate over both lists with a single iteration (similar to the merge step in merge sort), the sorting will take O(NlogN) and the iteration O(N), giving you a total running time of O(NlogN). Here N is the size of the larger of the two lists.

If you can replace the lists by a sorted structure (perhaps a TreeSet, since you said the elements are unique), you can implement removeAll in linear time, since you won't have to do any sorting.

I haven't tested it, but something like this can work (assuming both tempList and tempList2 are sorted) :

Iterator<Integer> iter1 = tempList.iterator();
Iterator<Integer> iter2 = tempList2.iterator();
Integer current = null;
Integer current2 = null;
boolean advance = true;
while (iter1.hasNext() && iter2.hasNext()) {
    if (advance) {
        current = iter1.next();
        advance = false;
    }
    if (current2 == null || current > current2) {
        current2 = iter2.next();
    }
    if (current <= current2) {
        advance = true;
        if (current == current2)
            iter1.remove();
    }
}

Upvotes: 10

Ayman
Ayman

Reputation: 11585

I suspect removing from an ArrayList, is a perfromance hit since the list may either be divided when an element in the middle is removed, or if the list must be compacted after an element is removed. It may be faster to do this:

  1. Create 'Set' of the elements to be removed
  2. Create a new result ArrayList that you need, call it R. You can give it enough size at construction.
  3. Iterate thru the original list you need elements from it removed, if the element is found in the Set, don't add it to R, otherwise add it.

This should have O(N); if creating the Set and a lookup in it is assumed constant.

Upvotes: 2

Related Questions