Alceu Costa
Alceu Costa

Reputation: 9889

How can I compute the average cost for this solution of the element uniqueness problem?

In the book Introduction to the Design & Analysis of Algorithms, the following solution is proposed to the element uniqueness problem:

ALGORITHM UniqueElements(A[0 .. n-1])
// Determines whether all the elements in a given array are distinct
// Input: An array A[0 .. n-1]
// Output: Returns "true" if all the elements in A are distinct
//         and false otherwise.
for i := 0 to n - 2 do
   for j := i + 1 to n - 1 do
      if A[i] = A[j] return false
return true

How can I compute the average cost (i.e. number of comparisons for a given n) for this algorithm? What is a reasonable assumption about the input?

Upvotes: 1

Views: 1404

Answers (4)

nategoose
nategoose

Reputation: 12382

If you need an exact value for a given input length then this will work (thought it is overkill):

ALGORITHM complexity_counter_of_UniqueElements(A[0 .. n-1]) 
// Determines whether all the elements in a given array are distinct 
// Input: An array A[0 .. n-1] 
// Output: Returns "true" if all the elements in A are distinct 
//         and false otherwise. 
counter acc = 0;
for i := 0 to n - 2 do 
   for j := i + 1 to n - 1 do 
      //if A[i] = A[j] return false 
      acc := 1 + acc
return acc

It is easy to see that this algorithm is O(nn) though, which is probably what you're interested in. The algorithm compares every element by every other element. If you created a table with the results of this the table would have to be at least ((nn)/2) to hold all of the results.

edit: I see now what you were really asking.

You need to compute the probability that each comparison may result in a match. This depends on the size of your elements (things that live in A) and what kind of distribution they have.

Assuming a random distribution the chance that any two random A[x] == A[y] where x != y would be 1.0/(number of possible values of element).

P(n)
total_chance := 0.0
for i:= 0 to n - 2 do
   for j := i + 1 to n - 1 do
      this_chance := 1.0/(number_of_possible_values_of_element)
      total_chance :=  total_chance + ((1-total_chance)*this_chance)
      // This should be the the probability of the newly compared pair being equal weighted
      // to account for the chance that it actually mattered (ie, hadn't found a match earlier)
return total_chance

O((1-P(n))nn), but P(n) is <= 1, so it is less than n*n

Upvotes: 0

Jack
Jack

Reputation: 133577

Since you iterate twice over the array in a nested way, worst case cost should be O(n²)..

a closer look would show you that since you start second loop from the element after the one you are checking you have:

N-1 + (N-2) + (N-3) + (N-4) + (N-5) + .... + 1

comparisons so the exact average cost would be N*(N-1) / 2

According to your comment I think that you should assume that every element is uniformely chosen between the set of possible values.

This means that the element A[i] has the probability 1/n of being exactly a specified value. Starting from here you can do your considerations:

  • first of all you choose a whatever element of the array A[i]. What is the probability of having A[i] == A[i+1]? It's 1/n² since both elements are supposed to be random.
  • what is the probability of having A[i] == A[i+2]? You have 1/n * (n-1/n) * 1/n because you have respectively a specified element, anything except the specified one, and the same specified element
  • you can extend the argumentation over any element A[k] with k>i, then you add all probabilities and you will have which is the average probability of having two unique element in the array starting from a specified one.
  • you extend thing thing further considering that you can start from any A[i] with i = 0..l-1. Of course every different i will have different probabilities because array will be shorter as i increases.

NOTE: n is the number of different items that can be inserted into the array, not its length.

After this you can easily estimate your average comparison cost..

Upvotes: 0

John Feminella
John Feminella

Reputation: 311496

If you don't know anything else about the input, then a reasonable assumption is that it's random. If so, and if the space of possible choices is large (e.g. the set of all real numbers), then the likelihood of two elements being the same is vanishingly small. (Mathematically, we say that the event of two randomly selected real numbers being distinct is almost sure.)

That means that your average case is equal to your worst case: you'll have to scan every element in the array to be sure that each one is distinct. Then the number of comparisons is n * (n - 1) / 2, or the sum of 1 ... n.

Upvotes: 1

IVlad
IVlad

Reputation: 43477

I think it's hard to talk about an average cost. The worst case cost is O(n2) and happens either when the repeated elements are towards the end of the array, for example something like this:

2 3 4 5 ... 1 1

Or when the array contains nothing but distinct elements.

The best case is when the array starts with two repeated elements, like this:

1 1 ...

In which case the cost is a single comparison. Another good case is when there exists an element near the beginning of the array that repeats at the end of the array, something like this:

2 3 4 1 ... 1

This will be (closer to) O(n).

The fact is the cost depends on the input, so you might as well assume you're going to always hit a worst case and try to find a better algorithm, maybe something based on sorting the array or on using hash tables, giving you O(nlog n) worst case and O(n) average case respectively.

Upvotes: 0

Related Questions