Reputation: 143

How to intersect two sorted integer arrays without duplicates?

This is an interview question that I am using as a programming exercise.

Input: Two sorted integer arrays A and B in increasing order and of different sizes N and M, respectively

Output: A sorted integer array C in increasing order that contains elements that appear in both A and B

Contraints: No duplicates are allowed in C

Example: For input A = {3,6,8,9} and B = {4,5,6,9,10,11}, the output should be C = {6,9}

Thank you for your answers, all! To summarize, there are two main approaches to this problem:

My original solution was to keep two pointers, one for each array, and scanning the arrays from left to right interchangeably, while picking out elements that match. So when we the current element of one array is larger than the second array, we keep incrementing the pointer of the second array until we either find the current first array element or overpass it (find one larger). I keep all matched in a separate array, which is returned once we reach the end of either one of the input arrays.

Another way that we could do this is to scan one of the arrays linearly, while using binary search to find a match in the second array. This would mean O(N*log(M)) time, if we scan A and for each of its N elements binary search on B (O(log(M)) time).

I've implemented both approaches and ran an experiment to see how the two compare (details on this can be found here). The Binary Search method seems to win when M is roughly 70 times larger than N, when N has 1 million elements.

Upvotes: 12

Answers (7)

Pavel Kosarev

Reputation: 1

public static int[] getIntersectionOfSortedArrays(int[] numbers1, int[] numbers2) {
    var size1 = numbers1.length;
    var size2 = numbers2.length;

    var elementsCount = Math.min(size1, size2);
    var result = new int[elementsCount];

    var i1 = 0;
    var i2 = 0;
    var index = 0;

    while (i1 < size1 && i2 < size2) {
        if (numbers1[i1] == numbers2[i2]
                && (index == 0 ||  numbers1[i1] != result[index - 1])) {
            result[index] = numbers1[i1];
            i1++;
            i2++;
            index++;
        } else if (numbers1[i1] > numbers2[i2]) {
            i2++;
        } else {
            i1++;
        }
    }

    return Arrays.copyOf(result, index);
}

Upvotes: 0

DRobinson

Reputation: 4471

Here's a memory improvement:

It would be better to store your results (C) in a dynamic structure, like a linked list, and create an array after you're done finding the intersecting elements (exactly as you do with array r). This technique would be especially good if you have very large arrays for A and B and expect the common elements to be few in comparison (why search for a huge chunk of contiguous memory when you only need a small amount?).

EDIT: one more thing that I would change, and this might be just a little nit-picky, is that I would avoid using unbound loops when the worst case number of iterations is known before hand.

Upvotes: 0

Periastron

Reputation: 663

Using arraylist to store result.

public ArrayList<Integer> arrayIntersection(int [] a, int[] b)
{
    int len_a=a.length;
    int len_b=b.length;
    int i=0;
    int j=0;
    ArrayList<Integer> alist=new ArrayList();

    while(i<len_a && j<len_b)
    {
        if(a[i]<b[j])
            i++;
        else if(a[i]>b[j])
            j++;
        else if(a[i]==b[j])
        {
            alist.add(a[i]);
            i++;
            j++;

        }
    }

   return alist;    
  }

Upvotes: 3

Tim Gee

Reputation: 1062

This problem essentially reduces to a join operation and then a filter operation (to remove duplicates and only keep inner matches).

As the inputs are both already sorted, the join can be efficiently achieved through a merge join, with O(size(a) + size(b)).

The filter operation will be O(n) because the output of the join is sorted and to remove duplicates all you have to do is check if the each element is the same as the one before it. Filtering only the inner matches is trivial, you just discard any elements that were not matched (the outer joins).

There are opportunities for parallelism (both in the join and filter) to achieve better performance. For example the Apache Pig framework on Hadoop offers a parallel implementation of a merge join.

There are obvious trade-offs between performance and complexity (and thus maintainability). So I would say a good answer to a interview question really needs to take account of the performance demands.

Set based comparison - O(nlogn) - Relatively slow, very simple, use if there are no performance concerns. Simplicity wins.
Merge join + Filter - O(n) - Fast, prone to coding error, use if performance is an issue. Ideally try to leverage an existing library to do this, or perhaps even use a database if appropriate.
Parallel Implementation - O(n/p) - Very fast, requires other infrastructure in place, use if the volume is very large and anticipated to grow and this is a major performance bottleneck.

(Also note that the function in the question intersectSortedArrays is essentially a modified merge join, where the filter is done during the join. You can filter afterwards at no performance loss, although a slightly increased memory footprint).

Final thought.

In fact, I suspect most modern commercial RDBMSs offer thread parallelism in their implementation of joins, so what the Hadoop version offers is machine-level parallelism (distribution). From a design point of view, perhaps a good, simple solution to the question is to put the data on a database, index on A and B (effectively sorting the data) and use an SQL inner join.

Upvotes: 5

Kent

Reputation: 195049

I don't know if it is a good idea to solve the problem in this way:

say

  A,B are 1 based arrays
    A.length=m
    B.length=n

1) init an array, C, with min(m,n) length

2) only focus on the common part by checking the first and last element. here binary search could be used. take an example to save some words:

 A[11,13,15,18,20,28,29,80,90,100.........300,400]
    ^                                          ^
 B[3,4,5,6,7.8.9.10.12,14,16,18,20,..400.....9999]
                     ^                ^


then we need only focus  on

    A[start=1](11)-A[end=m](400)
    and
    B[start=9](12)-B[end](400)

3). compare the range (end-start) of both Arrays. taking the array with smaller range , say A, for each element A[i] from A[start] ~ A[end], do binary search in B[start,end],

if found, put element in C, reset B.start to foundIdx+1,
otherwise B.start is set to the smallest element [j], which B[j] is greater than A[i], to narrow the range

4) continue 3) untill all elements in A[start, end] was processed.

by step 1, we could find the case if there is no intersection between two Array.
when doing binary search in step 3, we compare A[i] with A[i-1], if same, skip A[i]. in order to keep elements in C are unique.

in this way, worse case would be lg(n!) if(A and B are same) ? not sure.

Avg case?

Upvotes: 0

bchetty

Reputation: 2241

If you are using 'Integer' (object) arrays and would like to use the java API methods, you can check the below code. Note that the below code probably has more complexity (as it uses some conversion logic from one datastructure to other) and memory consumption (because of using objects) than the primitive method, as listed above. I just tried it (shrugs):

public class MergeCollections {
    public static void main(String[] args) {
        Integer[] intArray1 = new Integer[] {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
        Integer[] intArray2 = new Integer[] {2, 3, 5, 7, 8, 11, 13};

        Set<Integer> intSet1 = new TreeSet<Integer>();
        intSet1.addAll(Arrays.asList(intArray1));
        intSet1.addAll(Arrays.asList(intArray2));
        System.out.println(intSet1);
    }
}

And the output:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13]

Also, check this link: Algolist - Algo to merge sorted arrays

EDIT: Changed HashSet to TreeSet

EDIT 2: Now that the question is edited and clear, I'm adding a simple solution to find intersection :

public class Intersection {
    public static void main(String[] args) {
        Integer[] intArray1 = new Integer[] {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
        Integer[] intArray2 = new Integer[] {2, 3, 5, 7, 8, 11, 13};

        List<Integer> list1 = Arrays.asList(intArray1);
        Set<Integer> commonSet = new TreeSet<Integer>();
        for(Integer i: intArray2) {
            if(list1.contains(i)) {
                commonSet.add(i);
            }
        }

        System.out.println(commonSet);
    }
}

Upvotes: -1

NPE

Reputation: 500317

How about:

public static int[] intersectSortedArrays(int[] a, int[] b){
    int[] c = new int[Math.min(a.length, b.length)]; 
    int ai = 0, bi = 0, ci = 0;
    while (ai < a.length && bi < b.length) {
        if (a[ai] < b[bi]) {
            ai++;
        } else if (a[ai] > b[bi]) {
            bi++;
        } else {
            if (ci == 0 || a[ai] != c[ci - 1]) {
                c[ci++] = a[ai];
            }
            ai++; bi++;
        }
    }
    return Arrays.copyOfRange(c, 0, ci); 
}

Conceptually it's similar to yours, but contains a number of simplifications.

I don't think you can improve on the time complexity.

edit: I've tried this code, and it passes all of your unit tests.

Upvotes: 6

How to intersect two sorted integer arrays without duplicates?

Answers (7)

Related Questions