cpray89
cpray89

Reputation: 95

most efficient algorithm to get union of 2 ordered lists

I need to find the union of 2 descending ordered lists (list1 and list2), where the union would be each element from both lists without duplicates. Assume the list elements are integers. I am using big O notation to determine the most efficient algorithm to solve this problem. I know the big O notation for the 1st, but I do not know the big O notation for the 2nd. Can someone tell me the big O notation of the 2nd algorithm so I can decide which algorithm to implement? If someone knows a better algorithm than one of these, could you help me understand that as well? Thanks in advance.

Here are my two algorithms. . .

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Algorithm #1: O(N * log base2 N)

Starting at the first element of list1, 
while(list1 is not at the end of the list) {
    if(the current element in list1 is not in list2)    // Binary Search -> O(log base2 N)
        add the current element in list1 to list2
    go to the next element in list1 }

list2 is now the union of the 2 lists

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Algorithm #2: O(?)

Starting at the first elements of each list,
LOOP_START:
    compare the current elements of the lists
    whichever element is greater, put into a 3rd list called list3
    go to the next element in the list whose element was just inserted into list3
    branch to LOOP_START until either list1 or list2 are at the end of their respective list
insert the remaining elements from either list1 or list2 into list3 (the union)

list3 now contains the union of list1 and list2

Upvotes: 7

Views: 4153

Answers (8)

Shabbir
Shabbir

Reputation: 1

I had implemented a typescript(js) based implementation of Union operation of 2 arrays of object in one of my previous projects. The data was too large and the default library functions like underscore or lodash were not optimistic. After some brain hunting i came up with the below binary search based algorithm. Hope it might help someone for performance tuning.

As far as complexity is concerned, the algorithm is binary search based and will end up to be O(log(N)).

Basically the code takes two unordered object arrays and a keyname to compare and: 1) sort the arrays 2) iterate through each element of first array and delete it in second array 3) concatenate resulting second array into first array.

    private sortArrays = (arr1: Array<Object>, arr2: Array<Object>, propertyName: string): void => {
        function comparer(a, b) {
            if (a[propertyName] < b[propertyName])
                return -1;
            if (a[propertyName] > b[propertyName])
                return 1;
            return 0;
        }

        arr1.sort(comparer);
        arr2.sort(comparer);
    }

    private difference = (arr1: Array<Object>, arr2: Array<Object>, propertyName: string): Array<Object> => {

        this.sortArrays(arr1, arr2, propertyName);

        var self = this;

        for (var i = 0; i < arr1.length; i++) {
            var obj = {
                loc: 0
            };
            if (this.OptimisedBinarySearch(arr2, arr2.length, obj, arr1[i], propertyName))
                arr2.splice(obj.loc, 1);
        }

        return arr2;
    }

    private OptimisedBinarySearch = (arr, size, obj, val, propertyName): boolean => {
        var first, mid, last;
        var count;

        first = 0;
        last = size - 1;
        count = 0;

        if (!arr.length)
            return false;
        while (arr[first][propertyName] <= val[propertyName] && val[propertyName] <= arr[last][propertyName]) {
            mid = first + Math.floor((last - first) / 2);

            if (val[propertyName] == arr[mid][propertyName]) {
                obj.loc = mid;
                return true;
            }
            else if (val[propertyName] < arr[mid][propertyName])
                last = mid - 1;
            else
                first = mid + 1;
        }
        return false;
    }

    private UnionAll = (arr1, arr2, propertyName): Array<Object> => {
        return arr1.concat(this.difference(arr1, arr2, propertyName));
    } 

    //example
    var YourFirstArray = [{x:1},{x:2},{x:3}]
    var YourSecondArray= [{x:0},{x:1},{x:2},{x:3},{x:4},{x:5}]
    var keyName = "x";
    this.UnionAll(YourFirstArray, YourSecondArray, keyName)

Upvotes: 0

Luca Mastrostefano
Luca Mastrostefano

Reputation: 3271

With the following algorithm you can have the two lists merged in O(n+m).

[Sorry, I have used python for simplicity, but the algorithm is the same in every language]

Note that the algorithm also maintains the items sorted in the result list.

def merge(list1, list2):
    result = []
    i1 = 0;
    i2 = 0;
    #iterate over the two lists
    while i1 < len(list1) and i2 < len(list2):
        #if the current items are equal, add just one and go to the next two items
        if list1[i1] == list2[i2]:
            result.append(list1[i1])
            i1 += 1
            i2 += 1
        #if the item of list1 is greater than the item of list2, add it and go to next item of list1
        elif list1[i1] > list2[i2]:
            result.append(list1[i1])
            i1 += 1
        #if the item of list2 is greater than the item of list1, add it and go to next item of list2
        else:
            result.append(list2[i2])
            i2 += 1
    #Add the remaining items of list1
    while i1 < len(list1):
        result.append(list1[i1])
        i1 += 1
    #Add the remaining items of list2
    while i2 < len(list2):
        result.append(list2[i2])
        i2 += 1
    return result

print merge([10,8,5,1],[12,11,7,5,2])

Output:

[12, 11, 10, 8, 7, 5, 2, 1]

Upvotes: 1

acelent
acelent

Reputation: 8135

There are a few things that need to be specified:

  • Do the input lists contain duplicates?
  • Must the result be ordered?

I'll assume that, using std::list, you can cheaply insert at the head or at the tail.

Let's say List 1 has N elements and List 2 has M elements.


Algorithm 1

It iterates over every item of List 1 searching for it in List 2.

Assuming that there may be duplicates and that the result must be ordered, the worse case time for the search is that no element in List 1 exists in List 2, hence it's at least:

  • O(N × M).

To insert the item of List 1 in the right place, you need to iterate List 2 again until the point of insertion. The worse case will be when every item in List 1 is smaller (if List 2 is searched from the beginning) or greater (if List 2 is searched from the end). Since the previous items of List 1 have been inserted in List 2, there would be M iterations for the first item, M + 1 for the second, M + 2 for the third, etc. and M + N - 1 iterations for the last item, for an average of M + (N - 1) / 2 per item.

Something like:

  • N × (M + (N - 1) / 2)

For big-O notation, constant factors don't matter, so:

  • N × (M + (N - 1))

For big-O notation, non-variable additions don't matter, so:

  • O(N × (M + N))

Adding to the original O(N × M):

  • O(N × M) + O(N × (M + N))
  • O(N × M) + O(N × M + N2)

The second equation is just to make the constant factor elimination evident, e.g. 2 × (N × M), thus:

  • O(N × (M + N))
  • O(N2 + N × M)

These two are equivalent, which ever you like the most.

Possible optimizations:

  • If the result doesn't have to be ordered, insertion can be O(1), hence the worse time case is:

    • O(N × M)

  • Don't just test each List 1 item in List 2 by equality, test if each item by e.g. greater than, so that you can stop searching in List 2 when List 1's item is greater than List 2's item; this wouldn't reduce the worse case, but it would reduce the average case
  • Keep the List 2 iterator that points to where List 1's item was found to be greater than List 2's item, to make the sorted insertion O(1); on insertion make sure to keep an iterator that starts at the inserted item, because although List 1 is ordered, it might contain duplicates; with these two, the worse time case becomes:

    • O(N × M)

  • For the next iterations, search for List 1's item in the rest of List 2 with the iterator we kept; this reduces the worse case, because if you reach the end of List 2, you'll be just "removing" duplicates from List 1; with these three, the worse time case becomes:

    • O(N + M)

By this point, the only difference between this algorithm and Algorithm 2 is that List 2 is changed to contain the result, instead of creating a new list.


Algorithm 2

This is the merging of the merge sort.

You'll be walking every element of List 1 and every element of List 2 once, and insertion is always made at the head or tail of the list, hence the worse case time is:

  • O(N + M)

If there are duplicates, they're simply discarded. The result is more easily made ordered than not.


Final Notes

If there are no duplicates, insertion can be optimized in both cases. For instance, with doubly-linked lists, we can easily check if the last element in List 1 is greater than the first element in List 2 or vice-versa, and simply concatenate the lists.

This can be further generalized for any tail of List 1 and List 2. For instance, in Algorithm 1, if a List 1's item is not found in List 2, we can concatenate List 2 and the tail of List 1. In Algorithm 2, this is done in the last step.

The worse case, when List 1's items and List 2's items are interleaved, is not reduced, but again the average case is reduced, and in many cases by a big factor that makes a big difference In Real Life™.

I ignored:

  • Allocation times
  • Worse space differences in the algorithms
  • Binary search, because you mentioned lists, not arrays or trees

I hope I didn't make any blatant mistake.

Upvotes: 0

Khalefa
Khalefa

Reputation: 2304

Actually, algorithm 2 should not work if the input lists are not sorted. To sort the array it is order O(m*lg(m)+ n*lg(n))

You can build a hash table on the first list, then for each item from the second list, you check if this item exists in the hash table. This works in O(m+n).

Upvotes: 0

devsathish
devsathish

Reputation: 2419

Here is another approach: Iterate through both lists, and insert all the values into a set. This will remove all duplicates and the result will be the union of two lists. Two important notes: You'll loose the order of the numbers. Also, it takes additional space.

Time complexity: O(n + m)

Space Complexity: O(n + m)

If you need to maintain order of the result set, use some custom version of LinkedHashMap.

Upvotes: 0

Abhishek Bansal
Abhishek Bansal

Reputation: 12715

Complexity Analysis:

Say the length of list 1 is N and that of list 2 is M.

Algorithm 1:
At the risk of sounding incredible, I would accept that according to me the complexity of this algorithm as such is N * M and not NlogM.

For each element in list 1 (O(N)), we are searching it in list 2 (O(logM). The complexity of this algorithm 'seems' O(NlogM).

However, we are also inserting the element in list 2. This new element should be inserted in proper place so that the list 2 remains sorted for further binary search operations. If we are using array as the data structure, then the insertion would take O(M) time.

Hence the order of complexity is O(N*M) for the algorithm as is.

A modification can be done, wherein the new element is inserted at the end of the list 2 (the list is then no more ordered) and we carry out the binary search operation from index 0 to M-1 rather than the new size-1. In this case the complexity shall be O(N*logM) since we shall carry out N binary searches in the list of length M.

To make the list ordered again, we will have to merge the two ordered parts (0 to M-1 and M to newSize-1). This can be done in O(N+M) time (one merge operation in merge sort of array length N+M). Hence the net time complexity of this algorithm shall be

O(NlogM + N + M)

Space complexity is O(max(N,M)) not considering the original lists space and only considering the extra space required in list 2.

Algorithm 2:
At each iteration, we are moving atleast 1 pointer forward. The total distance to travel by both pointers is N + M. Hence the order of time complexity in worst case is O(N+M) which is better than 1st algorithm.

However, the space complexity required in this case is larger (O(N+M)).

Upvotes: 0

Udo Klein
Udo Klein

Reputation: 6902

The second is O(n+m) while the first is O(n log(m) + m). Thus the second is significantly better.

Upvotes: 8

angelatlarge
angelatlarge

Reputation: 4150

Here's my assessment of the situation

  • Your first algorithm runs in n log n time: you are doing the binary search for every element in the first list, right?
  • Your second algorithm is not entirely complete: you don't say what to do if the elements in the two lists are equal. However, given the right logic for dealing with equal elements, your second algorithm is like the merge part of the merge sort: it will run in linear time (i.e. N). It is optimal, in a sense that you cannot do better than that: you cannot merge two ordered lists without looking at every element in both list at least once.

Upvotes: 9

Related Questions