Peter
Peter

Reputation: 149

Count distinct values in an array - C++

I'm trying to teach myself (re-learn) C++ and doing problems from books and tests online to get some practice. I came across this problem which has left me a little confused. How would I best go about it?

I have to write a function

class Solution { public int distinct (int [] A); }

that returns the number of distinct values in the array A. I can assume that the array range is 0 to 100,000. And that the elements are all integers which are + or - 1,000,000. Any ideas? I was thinking of looping through and counting up for each value but that's probably really inefficient right? Thanks in advance.

Upvotes: 0

Views: 14560

Answers (5)

sehe
sehe

Reputation: 392911

Edit Updated: included a space-optimized algorithm as well just for fun

You can use a std::set to contain the unique values. Just copy the array elements into a set (anyway you like), and count the number of unique elements from the set afterwards.

Here is a rather succinct bit of code that doesn't require you to even specify the size of the array (though, normally in c++ you'd be using a std::vector anyway):

See it live on http://ideone.com/rpWGS (which contains test data and output)

#include <set>

class Solution 
{ 
   public: 

     // using std::set (max O(n) additional storage)
     template<size_t N>
         static size_t distinct (int (&a)[N])
     {
         return std::set<int>(a, a+N).size();
     }

     // using std::unique (inplace mutation; no additional storage)
     template<size_t N> 
         static size_t distinct_optim(int (&a)[N])
     {
         std::sort(a, a+N);
         int* newend = std::unique(a, a+N);
         return newend - a; 
     }

};

Upvotes: 6

paxdiablo
paxdiablo

Reputation: 881273

To get the number of distinct values in an array, I can see two possibilities.

The first is to sort them and then count the number of transitions (adding one). For example, the folloing list:

1 1 1 1 2 2 3 4 4 5
       ^   ^ ^   ^

has four transitions, hence five distint values.

The other possibility is to set up an array of "booleans" indicating whether a number had been seen before, pseudocode such as (in your case):

def countDistinct (array):
    def notSeenYet[-1,000,000..1,000,000] as all true
    count = 0
    for each value in array:
        if notSeenYet[value]:
            notSeenYet[value] = false
            count = count + 1
    return count

The first requires a sort which would be at best O(n log n) time complexity. This is unlikely to be a serious problem for 100,000 elements but you may not want the array modified in any way (which would require a copy, O(n) space complexity).

The second is O(n) time complexity and constant storage for your case. Two million boolean values may be of concern, depending on your environment but, if it's available, that would be better, assuming that time is your main concern (and it usually is).

Upvotes: 1

Jerry Coffin
Jerry Coffin

Reputation: 490108

Your solution is reasonably efficient (in fact, about as efficient as possible, in terms of time complexity), but in space -- to count the values, you need an array sized to the range of the possible values, so to count the instances in your array of 100,000 items you need an auxiliary array of ~2,000,000 items (covering the range from -1,000,000 to 1,000,000).

You have a couple of ways to avoid/reduce that. One is to just store one bit for each possible input, and set the bit when you see that input. This has the same basic complexity, but reduces the space for the count to the minimum necessary (i.e., you don't really care how many times any input has occurred, only whether it occurred or not). In C++, the obvious way to do this would be std::vector<bool>. While often maligned, in this case, vector<bool> does exactly what you're looking for.

Another possibility would be to use a sparse mapping from the input numbers to the count/bit. Especially when your range is much larger than the number of inputs, this could save quite a bit of space (the space taken will be proportional to the number of inputs, not the range). In C++, the obvious way to do this would be std::set<int>. To maintain the same expected complexity (O(N) instead of O(N log N), you'd want to use an unordered_set instead.

Another possibility is to sort the inputs, then eliminate duplicates. This generally keeps auxiliary storage to a minimum, but generally requires slightly longer to execute (O(N log N) instead of O(N)). For this, you'd probably use std::vector, std::sort, and std::unique.

Upvotes: 6

Vlad
Vlad

Reputation: 18633

I can think of two options:

1) sort the vector using quick sort or merge sort, and then iterate over the sorted vector, counting up each time you encounter a value different from current value.

2) set up a std::vector<bool> of size 1,000,000 and put in true values as you iterate over your array. afterwards you count the number of true values. I say vector<bool> because it's optimized for efficient storage, i.e. it probably stores 8 bools in a byte.

Upvotes: 1

twerdster
twerdster

Reputation: 5017

Sort the array A. Then go through the sorted array and count the number of times the difference between two consecutive numbers is non zero. Make sure you take care of the edges of the array and cases where the array is of size 1.

Upvotes: 2

Related Questions