Fly
Fly

Reputation: 872

Getting unique elements and intersection of two (or x) lists efficiently

If I have two (or more) lists, I would like to get all parts of the "venn diagram" made by those lists: i.e. elements unique to A, elements unique to B, and the intersection of A and B.

The "standard", but also naive, way of getting such a result would be using two loops, by going through all of A and checking if the elements are in B or not, thus getting the results of A_unique and intersection. By then looping through B and checking if the elements are in A, we would get B_unique.

Most languages have some kind of intersect or difference function implemented in their standard libraries, but they usually work that way anyway.

However, this seems extremely inefficient. Is there a better way / algorithm of getting those three parts, without having to loop twice (and perhaps even without using standard "contains" calls on the other list, which usually loops through that list as well, thus making each loop O(n^2)...)?

Would it perhaps already be an improvement to make both lists be queues, and pop the first item on each loop, thus reducing the amount of subsequent checks?

Upvotes: 0

Views: 233

Answers (1)

rici
rici

Reputation: 241951

You can do this with a single loop in (at least) two ways, with a preprocessing step. The following algorithms assume that neither list has duplicate elements; modifying it to handle duplicate elements is simpler with the second algorithm but can be done with the first one as well.

With a hash set.

  1. Create a hash table from list A. (Perhaps better, from the smaller of the lists, but I'm going to write this as if it's list A.)

  2. Create empty lists Only B and In Both

  3. For each element in B:

    • If it's not in the hash table, put in in Only B.
    • If it's in the hash table, remove it and put in In Both.
  4. When you're done with the loop, whatever is left in the hash table is Only A.

Using heaps / priority queues

  1. Use makeheap to turn both lists into min heaps
  2. While both heaps have elements:
    • If the smallest element in A is (strictly) less than the smallest element in B, remove it and put it in Only A.
    • Otherwise, if the smallest element in B is (strictly) less than the smallest element in A, remove it and put it into Only B
    • Otherwise, the smallest element of both heaps are equal. Remove both of them and put one in In Both.
  3. When one heap is empty, put all the elements of the other heap into Only A or Only B, depending on which heap still has elements left. If you want the new sets to be sorted, use the heap's remove-min method; if you don't care, you can save a few cycles by just copying from the underlying array.

You can run this algorithm in-place to avoid memory allocations, as long as you don't care about the order of the resulting lists, by keeping In Both at the end of one (or both) of the originals, maintaining a partition point, which is initially at the very end (indicating that the In Both list is empty). Instead of putting an element present in both lists into a separate data structure, you remove it as usual from its heap, which moves it to the beginning of the free space, then swap it with the element just before the partition point, and finally move the partition back one element.

Upvotes: 1

Related Questions