Reputation: 141

Finding missing number in an unsorted array with repeated values

I have an array of size "n" with unsorted and reapeated integers between [0-99]. I already know that there is one number missing. So, what is the fastest way to find the missing number? This is my solution until now in C.

int find_missing_number(int array[], int n)
{
    char checker[100] = {0};
    int xor = 0, counter = 0, i, temp;
    //xor = (n%4==0) ? 0 : (n%4==1) ? n-1 : (n%4==2) ? 1 : n;

    for(i = 0; i < n; i++)
        if(checker[temp = array[i]] == 0)
        {
            checker[temp] = 1;
            xor ^= temp;
            if(++counter == 99)
                return xor;
        }

    return -1;
}

Upvotes: 0

Answers (5)

cosmos

Reputation: 2303

As there are repeated elements, which I missed to see before writing my previous solution, I believe complexity of your solution is best.

I would suggest, instead of xor'ing the elements, you could just read the checker array to find missing element.

int find_missing_number(int array[], int n)    
{
    int checker[100] = {0};

    for(int i = 0; i < n; i++)
        checker[array[i]]++;

    for(int i=0; i <100; i++)
        if(0 == checker[i])
            return i;
}

This will reduce the operations which depends on n, making the code faster for bigger n.

Upvotes: 1

Joël Hecht

Reputation: 1842

Javia1492 had the main idea in his answer: summing the numbers.

The checker array prevents you to sum up twice the same number.

The counter variable counts the numbers of different numbers to end the search before testing the whole n numbers in array once you already have 99 different values.

It also works with the xor operator as you did but I don't like it a lot because that 1^2^3^..^99 == 0 is some sort of particular case you can't generalize with 1^2^..^N.

int find_missing_number(int array[], int n)
{
    char checker[100];
    register int sum = 0, counter = 0, i, temp;

    memset(checker, 0, sizeof(checker));
    for (i = 0; i < n; i++)
    {
        if (checker[temp = array[i]] == 0)
        {
            checker[temp] = 1;
            sum += temp;
            if (++counter == 99)
               break;
        }
    }
    return 4950 - sum; // (100*99)/2 - sum
}

Upvotes: 1

JS1

Reputation: 4767

The original program is already very fast. However, given the problem statement, it can be made faster by changing this:

        if(++counter == 100)
            return -1;

to this:

        if(++counter == 99)
            return xor;

This change causes the program to immediately return the answer when 99 distinct elements have been found. So if the array were very large, only a small portion of the array would be processed and the rest of the array can be ignored. This depends on the problem statement which states that it is known that one element is missing.

Upvotes: 1

John Bollinger

Reputation: 181094

Given that you may have duplicates, I don't see any way around keeping some kind of record of which values have been seen that you can read out on a per-value basis. Approaches based on one variety or another of one-way hashing function (two of which were proposed and then retracted) cannot do the job on their own. It may be, however, that you can save the effort of scanning your record of seen values after populating it by combining it with a hash-like function. For example,

int find_missing_number(int array[], int n)
{
    char checker[100] = {0};
    int xor = XOR_OF_1_TO_100;
    int i;

    for(i = 0; i < n; i++) {
        xor ^= (checker[array[i]] ? 0 : array[i] + 1);
        checker[array[i]] = 1;
    }

    return xor - 1;
}

This is admittedly pretty similar to your version, but I'm more confident that it will work, and I think that it probably runs slightly faster.

Note that I do not declare any variables register -- the compiler is much better than I am at choosing which variables should live in registers and which not, and it is not obligated to take my advice on the matter in any case.

Also, the elements of the checker array have type char, allowing four times as many to reside in a CPU cache line at once than if they were type int (assuming 1-byte chars and 4-byte ints).

Note too that I avoid counting distinct values or otherwise branching inside the loop (the ternary expression can be implemented without a branch). Avoiding the counting and conditional statement will speed the case where indeed one value is missing, and might or might not in practice slow down the case where none is missing. The gain may arise from more than just having less code -- sometimes such simplifications can allow the compiler to generate more efficient instruction sequences, too.

Of course, the whole thing is junk (and the problem is not well specified) if more then one value may be missing.

Upvotes: 1

Javia1492

Reputation: 892

You can do this in O(n). Iterate through the array and compute the sum of all numbers. Now, sum of natural numbers from 1 to N, can be expressed as Nx(N+1)/2. In your case N=99. Subtract the sum of the array from Nx(N+1)/2, where N=99. That is the missing number. The empty slot can be detected during the iteration in which the sum is computed.

Im not entirely sure if this is a faster method or not, but an option for you to try.

Edit: This would work if you didnt have repeated values.

Upvotes: 1

Finding missing number in an unsorted array with repeated values

Answers (5)

Related Questions