samba
samba

Reputation: 3091

Java - Find the difference between two arrays with duplicates

I've implemented a method for finding the difference between two unsorted arrays. Currently I have achieved getting the diff without duplicates. But how to make it take duplicates into consideration as well?
For example for the below input arrays I expect the output [4 5 3]:

int[] arr1 = {1, 2, 3, 4, 5, 5};
int[] arr2 = {1, 2, 3, 5, 3};

For these input arrays I expect [7 7 9]

int[] arr3 = {7, 7, 4, 9, 6};
int[] arr4 = {4, 6};

//

static ArrayList<Integer> findDifference(int[] a, int[] b) {
    ArrayList<Integer> arr1 = new ArrayList<Integer>() {
        { for (int i : a) add(i); }
    };
    ArrayList<Integer> arr2 = new ArrayList<Integer>() {
        { for (int i : b) add(i); }
    };

    if (arr1.size() > arr2.size()) {
        arr1.removeAll(arr2);
        return arr1;
    } else {
        arr2.removeAll(arr1);
        return arr2;
    }
}

Upvotes: 1

Views: 3159

Answers (4)

kevin ternet
kevin ternet

Reputation: 4612

Here is a solution that works with both exemple :

public static void main(String[] args) {
    int[] arr1 = {1, 2, 3, 4, 5, 5};
    int[] arr2 = {1, 2, 3, 5, 3};
    System.out.println(findDifference(arr1, arr2));
    int[] arr3 = {7, 7, 4, 9, 6};
    int[] arr4 = {4, 6};
    System.out.println(findDifference(arr3, arr4));
}
static ArrayList<Integer> findDifference(int[] a, int[] b) {
    ArrayList<Integer> list1 = new ArrayList<Integer>();
    Arrays.stream(a).forEach(e -> list1.add(e));
    ArrayList<Integer> list2 = new ArrayList<Integer>();
    Arrays.stream(b).forEach(e -> list2.add(e));

    ArrayList<Integer> list1Copy = new ArrayList<Integer>();
    ArrayList<Integer> list2Copy = new ArrayList<Integer>();
    list1Copy.addAll(list1);
    list2Copy.addAll(list2);

    list1.forEach(e -> list2Copy.remove(e));
    list2.forEach(e -> list1Copy.remove(e));
    list1Copy.addAll(list2Copy);
    return list1Copy;
}

output :

[4, 5, 3] [7, 7, 9]

The principle is to process removing operation on copy to be abble to iterate again on initial list

Upvotes: 0

BeUndead
BeUndead

Reputation: 3618

Almost certainly not an optimal solution, but as something you can hopefully work with:

private static <X> Collection<X> findDiff(final Collection<X> a, final Collection<X> b) {
    // Copy the Collections so you don't modify inputs
    // and so you can safely 'remove' from them.
    final List<X> aCopy = new ArrayList<>(a);
    final List<X> bCopy = new ArrayList<>(b);

    // Remove all common elements from the copies
    // Using 'removeAll' will pull out duplicates,
    // so do this one-by-one.
    for (final X bElement : b) {
        aCopy.remove(bElement);
    }
    // Note it's important to iterate over 'a' here, not
    // aCopy since the elements of aCopy (may) have had some
    // entries 'remove'd.
    for (final X aElement : a) {
        bCopy.remove(aElement);
    }

    // Combine the two cleared out lists to find
    // the cumulative difference.
    final List<X> diff = new ArrayList<>(aCopy);
    diff.addAll(bCopy);

    return Collections.unmodifiableCollection(diff);
}

Note, you can convert your int[] to a Collection<Integer> using something simple like:

IntStream.of(arr).boxed().collect(Collectors.toList());

Note also: you can do this with fewer intermediate Collections. You only need to copy one of the input ones if you don't mind modifying the inputs. And you don't need to combine the two into a new diff one. This was just something to work with (and more explanatory).

Upvotes: 1

Konstantin Yovkov
Konstantin Yovkov

Reputation: 62864

You can keep a count for each value in the first array. You can use a HashMap to hold how many occurences you have for a specific value.

Then, for each value in the second array, you can decrease the already calculated count for this value. In the end, if the count for a specific value is 0, this would mean that there were the same number of appearances in the both arrays. Otherwise, one of the arrays contained more occurences of value. The number of differences for a specific value would be abs(count[value]) (as it can get negative, in the case when the second arrays contains more occurences of value than the first array).

This Java code illustrates the approach:

public List<Integer> findDiff(int[] first, int[] second) {
  Map<Integer, Integer> count = new HashMap<>();
  for (int value : first) {
    int current = count.getOrDefault(value, 0);
    count.put(value, current + 1);
  }
  for (int value : second) {
    int current = count.getOrDefault(value, 0);
    count.put(value, current - 1);
  }
  List<Integer> result = new ArrayList<>();
  for (Map.Entry<Integer, Integer> entry : count.getEntrySet()) {
    int diff = entry.getValue();
    int times = Math.abs(diff);
    for (int i = 0; i < times; i++) {
      result.add(entry.getKey());
    }
  }
  return result;
}

Clearly, we have a linear complexity for both time and memory.

Upvotes: 3

akourt
akourt

Reputation: 5563

If you want an absolute difference between the two arrays (in this case the only different element being 4) you could calculate the union and the intersection of the two sets.

To exclude duplicates you could also use a Set instead of a List to guarantee uniqueness. A very simple sample could be the following:

    public static void main(String... args) {
        Integer[] arr1 = {1, 2, 3, 4, 5, 5};
        Integer[] arr2 = {1, 2, 3, 5, 3};

        Set<Integer> diffs = findDiff(arr1, arr2);
        diffs.forEach(System.out::println);
    }

    public static Set<Integer> findDiff(Integer[] array1, Integer[] array2) {
        List<Integer> list1 = Arrays.asList(array1);
        List<Integer> list2 = Arrays.asList(array2);
        Set<Integer> union = new HashSet<>(list1);
        union.addAll(list2);
        Set<Integer> intersection = new HashSet<>(list1);
        intersection.retainAll(list2);
        union.removeAll(intersection);
        return union;
    }

Upvotes: 0

Related Questions