Seth Keno
Seth Keno

Reputation: 679

efficiently find amount of integers in a sorted array

I'm studying for a test, and found this question.

You are given a sorted array of integers for example:

{-5, -5, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 67, 67, 99}

Write a method:

Public static int count (int[] a, int x)

that returns the amount of times, number 'x' is in the array.

for example:

x = -5, it returns 2
x = 2, it returns 5
x = 8, it returns 0

I need to write it as efficient as possible, please don't give me the answer (or write it if you wish, but I won't look), my idea is to do a binary search, and then go both edges (backward and forward) of the value that I find and with the index numbers, return the correct answer, my questions are:

  1. is it the most efficient way?
  2. won't it be O(n) in the worst case? (when the array is filled with a single number) -

If so - then why should I do a binary search?

Upvotes: 10

Views: 722

Answers (5)

Sebastian
Sebastian

Reputation: 8005

IMHO this is the most efficient solution: Others may have mentioned a similiar approach, but I think this one is the easiest to explain and the easiest to understand, also it has a modification that will speed up the process in practice:

Basically the idea is the find the smallest and largest index of the occurence. Finding the smallest is O(log N) using binary search (using Newton's method to actually increase the performance in the average case is a possible improvement). If you don't know how to modify binary search to find the smallest index, the trivial modification is to look for the element with value (p - 0.5) - obviously you will not find that value in the integer array, but if the binary search terminates the index will be the one next to where the recursion stops. You will just need to check whether it exists at all. This will give you the smallest index.

Now in order to find the largest index, again you will have to start a binary search, this time using the smallest index as the lower bound and (p + 0.5) as the search target, this is guaranteed to be O(log N), in the average case it will be O(log N/2). Using Newton's method and taking the values of the upper and lower bounds into account will in practice improve the performance.

Once you have found the largest and smallest index, the difference between them obviously is the result.

For evenly distributed numbers, using the Newton modification will drastically improve the runtime (in the case of a continuous equidistant sequence of numbers it will be O(1) (two or three steps) to find the smallest and greatest values), although the theoretical complexity still is O(log N) for arbitrary input.

Upvotes: 0

stinepike
stinepike

Reputation: 54672

Modify your binary search to find the first and last occurrence of the given input, then the difference between those two indexes is the result.

To find a first and last occurrence using binary search you need to change a bit from the usual binary search algorithm. In binary search the value is returned when a match is found. But here unlike the usual binary search you need to continue searching until you find a mismatch.

helpful links

finding last occurence, finding first occurance

A bit update

after you find the first occurrence then you can use that index as a starting point of the next binary search to find the last.

Upvotes: 15

Ukkonen
Ukkonen

Reputation: 11

Actually there is a slightly better solution than the given solutions! It is a combination of two different ways to do binary search.

First you do a binary search to get the first occurence. This is O(log n)

Now, starting with the first index you just found, you do a different kind of binary search: You guess the frequency of that element F, by starting with a guess of F=1 and doubling your estimate and check if the element repeats. This is guaranteed to be O(log F) (where F is the frequency).

This way, the algorithm runs in O(log N + log F)

You do not have to worry about the distribution of the numbers!

Upvotes: 1

Nikunj Banka
Nikunj Banka

Reputation: 11365

Do binary Search to find the first occurence. Do binary search to find the last occurence. The number of occurences is equal to the number of numbers between the 2 indices found.

Working code:

public class Main {
    public static void main(String[] args){
        int[] arr = {-5, -5, 1, 1, 1, 1, 1, 1, 
                                    1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 67, 67, 99};
        int lo = getFirst(arr, -5);
        if(lo==arr.length){ // the number is not present in the array.
            System.out.println(0);
        }else{
            int hi = getLast(arr, -5);
            System.out.println((hi-lo+1));
        }
    }

    // Returns last occurence of num or arr.length if it does not exists in arr.
    static int getLast(int[] arr, int num){
        int lo = 0, hi = arr.length-1, ans = arr.length;
        while(lo<=hi){
            int mid = (lo+hi)/2;
            if(arr[mid]==num){
                ans = mid;
                lo = mid+1;
            }else if(arr[mid]<num){
                lo = mid+1;
            }else if(arr[mid]>num){
                hi = mid-1;
            }
        }
        return ans;
    }

    // Returns first occurence of num or arr.length if it does not exists in arr.
    static int getFirst(int[] arr, int num){
        int lo = 0, hi = arr.length-1, ans = arr.length;
        while(lo<=hi){
            int mid = (lo+hi)/2;
            if(arr[mid]==num){
                ans = mid;
                hi = mid-1;
            }else if(arr[mid]<num){
                lo = mid+1;
            }else if(arr[mid]>num){
                hi = mid-1;
            }
        }
        return ans;
    }
}

Upvotes: 5

Two solutions come to mind:

1) Do Binary Search alright, but maintain that invariant that it finds the first occurence. Then do a linear search. This will be Theta(log n + C) where C is the count.

Programming Pearls by Jon Bentley has a nice write up, where he mentions looking for the first occurence is actually more efficient than looking for any occurence.

2) You can also do two binary searches, one for the first occurence, and one for the last, and take the difference of the indices. This will be Theta(log n).


You can decide which solution to apply based on the expected value of C. If C = o(log n) (yes small o), then looking for the first occurence and doing linear search will be better. Otherwise do two binary searches.

If you don't know the expected value of C, then you might be better off with solution 2.

Upvotes: 5

Related Questions