maxpayne
maxpayne

Reputation: 2499

In an array with integers one value is in the array twice. How do you determine which one?

Assume that the array has integers between 1 and 1,000,000.

I know some popular ways of solving this problem:

  1. If all numbers between 1 and 1,000,000 are included, find the sum of the array elements and subtract it from the total sum (n*n+1/2)
  2. Use a hash map (needs extra memory)
  3. Use a bit map (less memory overhead)

I recently came across another solution and I need some help in understanding the logic behind it:

Keep a single radix accumulator. You exclusive-or the accumulator with both the index and the value at that index.

The fact that x ^ C ^ x == C is useful here, since each number will be xor'd twice, except the one that's in there twice, which will appear 3 times. (x ^ x ^ x == x) And the final index, which will appear once. So if we seed the accumulator with the final index, the accumulator's final value will be the number that is in the list twice.

I will appreciate it if some one can help me understand the logic behind this approach (with a small example!).

Upvotes: 10

Views: 5844

Answers (4)

Jon
Jon

Reputation: 437584

Assume you have an accumulator

int accumulator = 0;

At each step of your loop, you XOR the accumulator with i and v, where i is the index of the loop iteration and v is the value in the ith position of the array.

accumulator ^= (i ^ v)

Normally, i and v will be the same number so you will end up doing

accumulator ^= (i ^ i)

But i ^ i == 0, so this will end up being a no-op and the value of the accumulator will be left untouched. At this point I should say that the order of the numbers in the array does not matter because XOR is commutative, so even if the array is shuffled to begin with the result at the end should still be 0 (the initial value of the accumulator).

Now what if a number occurs twice in the array? Obviously, this number will appear three times in the XORing (one for the index equal to the number, one for the normal appearance of the number, and one for the extra appearance). Furthermore, one of the other numbers will only appear once (only for its index).

This solution now proceeds to assume that the number that only appears once is equal to the last index of the array, or in other words: that the range of numbers in the array is contiguous and starting from the first index to be processed (edit: thanks to caf for this heads-up comment, this is what I had in mind really but I totally messed it up when writing). With this (N appears only once) as a given, consider that starting with

int accumulator = N;

effectively makes N again appear twice in the XORing. At this point, we are left with numbers that only appear exactly twice, and just the one number that appears three times. Since the twice-appearing numbers will XOR out to 0, the final value of the accumulator will be equal to the number that appears three times (i.e. one extra).

Upvotes: 8

Lundin
Lundin

Reputation: 214475

The question is: are you interested in knowing how to do clever but purely academic xor tricks with little relevance to the real world, or do you want to know this because in the real world you may write programs that use arrays? This answer addresses the latter case.

The no-nonsense solution is to go through the whole array and sort it as you do. While you sort, make sure there are no duplicate values, ie implement the abstract data type "set". This will probably require a second array to be allocated and the sorting will be time consuming. Whether it is more or less time consuming than clever xor tricks, I don't know.

However, what good is an array of n unsorted values to you in the real world? If they are unsorted we have to assume that their order is important somehow, so the original array might have to be preserved. If you want to search through the original array or analyse it for duplicates, median value etc etc you really want a sorted version of it. Once you have it sorted you can binary search it with "O log n".

Upvotes: 0

Hammerite
Hammerite

Reputation: 22340

Each number between 1 and 10,001 inclusive appears as an array index. (Aren't C arrays 0-indexed? Well, it doesn't make a difference provided we're consistent about whether the array values and indices both start at 0 or both start at 1. I'll go with the array starting at 1, since that's what the question seems to say.)

Anyway, yes, each number between 1 and 10,001 inclusive appears, precisely once, as an array index. Each number between 1 and 10,000 inclusive also appears as an array value precisely once, with the exception of the duplicated value which occurs twice. So mathematically, the calculation we're doing overall is the following:

1 xor 1 xor 2 xor 2 xor 3 xor 3 xor ... xor 10,000 xor 10,000 xor 10,001 xor D

where D is the duplicated value. Of course, the terms in the calculation probably don't appear in that order, but xor is commutative, so we can rearrange the terms however we like. And n xor n is 0 for each n. So the above simplifies to

10,001 xor D

xor this with 10,001 and you get D, the duplicated value.

Upvotes: 3

Nick Shaw
Nick Shaw

Reputation: 2113

The logic is that you only have to store the accumulator value, and only need to go through the array once. That's pretty clever.

Of course, whether this is the best method in practice depends on how much work it is to calculate the exclusive or, and how large your array is. If the values in the array are randomly distributed, it may be quicker to use a different method, even if it uses more memory, as the duplicate value is likely to be found possibly long before you check the entire array.

Of course if the array is sorted to begin with, things are considerably easier. So it depends very much on how the values are distributed throughout the array.

Upvotes: 0

Related Questions