Reputation: 19

Find index of Nth occurrence of a number using Binary Search

I have a finite array whose elements are only -1,0 or 1. I want to find the index of Nth occurrence of a number (say 0).

I can iterate through the entire array, but I'm looking for a faster approach. I can think of using Binary Search, but having trouble modelling the algorithm. How do I proceed with Binary Search in this case?

Upvotes: 1

Answers (3)

Francis Cugler

Reputation: 7905

Since you are looking to search through an array, a vector or some container where the search in question pertains to the index location of some element T based on its Nth occurrence within its container this post may be of some help to you:

According to your question as well as some of the comments in regards to it where you explicitly stated that your container is Unsorted while you were thinking of using a binary search and were having trouble with the process of modeling an algorithm:

This post here serves as an example of the development process towards the design of an algorithm in which it may help you achieve what you are looking for:
The search algorithm here is a linear one, where a binary search will not be suitable to your current needs:
This same process of building an algorithm can be applied to other types of algorithms including, binary searches, hash tables, etc.

- 1st Build

struct Index {
    static unsigned counter; // Static Counter

    unsigned location; // index location of Nth element
    unsigned count; // How many of this element up to this point

    Index() : location( 0 ), count( 0 ) {}
};    
unsigned Index::counter = 0;

// These typedefs are not necessarily needed; 
// just used to make reading of code easier.
typedef Index IndexZero; 
typedef Index IndexPos1;
typedef Index IndexNeg1;

template<class T>
class RepititionSearch {
public:
    // Some Constants to compare against: don't like "magic numbers"
    const T NEG { -1 }; 
    const T ZERO { 0 };
    const T POS { 1 };

private:
    std::vector<T> data_;   // The actual array or vector of data to be searched
    std::vector<Index> indices_; // A vector of Indexes - record keeping to prevent multiple full searches.

public:
    // Instantiating a search object requires an already populated container
    explicit RepititionSearch ( const std::vector<T>& data ) : data_( data )  {
        // make sure indices_ is empty upon construction.
        indices_.clear();
    }

    // method to find the Nth occurrence of object A
    unsigned getNthOccurrence( unsigned NthOccurrence, T element ) {
        // Simple bounds checking
        if ( NthOccurrence < 0 || NthOccurrence >= data.size() ) {
            // Can throw error or print message...;
            return -1;
        }           

        IndexZero zeros;
        IndexPos1 ones;
        IndexNeg1 negOnes;

        // Clear out the indices_ so that each consecutive call is correct
        indices_.clear();
        unsigned idx = 0;
        for ( auto e : data_ ) {
            if (  element == e && element == NEG ) {
                ++negOnes.counter;
                negOnes.location = idx;
                negOnes.count = negOnes.counter;
                indices_.push_back( negOnes );
            }

            if ( element == e && element == ZERO ) {
                ++zeros.counter;
                zeros.location = idx;
                zeros.count = zeros.counter;
                indices_.push_back( zeros );
            }

            if ( element == e && element == POS ) {
                ++ones.counter;
                ones.location = idx;
                ones.count = ones.counter;
                indices_.push_back( ones );
            }
            idx++;
        } // for each T in data_

        // Reset static counters
        negOnes.counter = 0;
        zeros.counter = 0;
        ones.counter = 0;

        // Now that we saved a record: find the nth occurance
        // This will not search the full vector unless it is last element
        // This has early termination. Also this vector should only be
        // a percentage of the original data vector's size in elements.
        for ( auto index : indices_ ) {
            if ( index.count == NthOccurrence) {
                // We found a match
                return index.location;
            } 
        }

        // Not Found
        return -1;
    }
};

int main() {

    // using the sample array or vector from User: Prune's answer!
    std::vector<char> vec{ -1, 0, 1, 1, -1, -1, 0, 0, 0, -1, 1 };


    RepititionSearch <char> search( vec );
    unsigned idx = search.getNthOccurrence( 3, 1 );

    std::cout << idx << std::endl;

    std::cout << "\nPress any key and enter to quit." << std::endl;
    char q;
    std::cin >> q;
    return 0;
}

// output:
10

The value of 10 is the correct answer as the 3^rd occurrence of the value 1 is at location 10 in the original vector since vectors are 0 based. The vector of indices is only used as book keeping for faster search.

If you noticed I even made this a class template to accept any basic type T that'll be stored in std::vector<T> as long as T is comparable, or has operators defined for it.

AFAIK I do not think that there is any other searching method that is faster than this for the type of search that you are striving for, but don't quote me on it. However I think I can optimize this code a little more... just need some time to look at it closer.

This may appear to be a bit crazy but this does work: just a bit of fun playing around with the code

int main() {

    std::cout << 
    RepititionSearch<char>( std::vector<char>( { -1, 0, 1, 1, -1, -1, 0, 0, 0, -1, 1 } ) ).getNthOccurrence( 3, 1 ) 
              << std::endl;  
}

It can be done on a single line & printed to the console without creating an instance of class.

- 2nd Build

Now this may not necessarily make the algorithm faster, but this would clean up the code a bit for readability. Here I removed the typedefs, and just by using a single version of the Index struct in the 3 if statements you will see duplicate code so I decided to make a private helper function for that and this is how simple the algorithm looks for clear readability.

struct Index {
    unsigned location;
    unsigned count;
    static unsigned counter;

    Index() : location(0), count(0) {}
};
unsigned Index::counter = 0;

template<class T>
class RepitiionSearch {
public:
    const T NEG  { -1 };
    const T ZERO {  0 };
    const T POS  {  1 };

private:
    std::vector<T> data_;
    std::vector<Index> indices_;

public:
    explicit RepititionSearch( const std::vector<T>& data ) : data_( data ) 
        indices_.clear();            
    }

    unsigned getNthOccurrence( unsigned NthOccurrence, T element ) {
        if ( NthOccurrence < 0 || NthOccurrence >= data.size() ) {
            return -1;
        }

        indices_.clear(); 

        Index index;     
        unsigned i = 0;

        for ( auto e : data_ ) {
            if ( element == e && element == NEG ) {
                addIndex( index, i );
            }
            if ( element == e && element == ZERO ) {
                addIndex( index, i );
            }
            if ( element == e && element == POS ) {
                addIndex( index, i );
            }
            i++;
        }
        index.counter = 0;

        for ( auto idx : indices_ ) {
            if ( idx.count == NthOccurrence ) {
                return idx.location;
            }
        }

        return -1; 
    }

private:
    void addIndex( Index& index, unsigned inc ) {
        ++index.counter;
        index.location = inc;
        index.count = index.counter;
        indices_.push_back( index );
    }
};

- 3rd Build

And to make this completely generic to find any Nth occurrence of any element T the above can be simplified and reduced down to this: I also removed the static counter from Index and moved it to the private section of RepititionSearch, it just made more sense to place it there.

struct Index {
    unsigned location;
    unsigned count;
    Index() : location(0), count(0) {}
};

template<class T>
class RepititionSearch {    
private:
    static unsigned counter_;
    std::vector<T> data_;
    std::vector<Index> indices_;

public:
    explicit RepititionSearch( const std::vector<T>& data ) : data_( data ) {
        indices_.clear();
    }

    unsigned getNthOccurrence( unsigned NthOccurrence, T element ) {
        if ( NthOccurrence < 0 || NthOccurrence >= data_.size() ) {
            return -1;
        }
        indices_.clear();

        Index index;
        unsigned i = 0;

        for ( auto e : data_ ) {
            if ( element == e ) {
                addIndex( index, i );
            }
            i++;
        }
        counter_ = 0;

        for ( auto idx : indices_ ) {
            if ( idx.count == NthOccurrence ) {
                return idx.location;
            }
        }
        return -1;
    }

private:
    void addIndex( Index& index, unsigned inc ) {
        ++counter_;
        index.location = inc;
        index.count = counter_;
        indices_.push_back( index );
    }    
};

template<class T>
unsigned RepititionSearch<T>::counter_ = 0;

- 4th Build

I have also done this same algorithm above without the need or dependency of needing a vector just to hold index information. This version doesn't need the Index struct at all and doesn't need a helper function either. It looks like this:

template<class T>
class RepititionSearch {
private:
    static unsigned counter_;
    std::vector<T> data_;
public:
    explicit RepititionSearch( const std::vector<T>& data ) : data_( data ) {}

    unsigned getNthOcc( unsigned N, T element ) {
        if ( N < 0 || N >= data_.size() ) {
            return -1;
        }

        unsigned i = 0;
        for ( auto e : data_ ) {
            if ( element == e ) {
                ++counter_;
                i++;
            } else {
                i++;
            }

            if ( counter_ == N ) {
                counter_ = 0;
                return i-1;
            }
        }

        counter_ = 0;

        return -1;
    }
};

template<class T>
unsigned RepititionSearch<T>::counter_ = 0;

Since we were able to remove the dependency of the secondary vector and removed the need for a helper function; we don't even need a class at all to hold the container; we can just write a function template that takes a vector and apply the same algorithm. Also there is no need for a static counter with this version.

- 5th Build

template<class T>
unsigned RepititionSearch( const std::vector<T>& data, unsigned N, T element ) {    

    if ( data.empty() || N < 0 || N >= data.size() ) {
        return -1;
    }

    unsigned counter = 0;
    unsigned i = 0;

    for ( auto e : data ) {
        if ( element == e ) {
            ++counter;
            i++;
        } else {
            i++;
        }

        if ( counter == N ) {
            return i - 1;
        }
    }
    return -1;
}

Yes this is a lot to take in; but these are the steps that are involved in the process of writing and designing an algorithm and refining it down to simpler code. As you have seen I have refined this code about 5 times. I went from using a struct, a class, typedefs, and a static member with multiple stored containers, to removing the typedefs and putting the repeatable code into a helper function, to removing the dependency of a secondary container & the helper function, down to not even needing a class at all and just creating a function that does what it is supposed to do.

You can apply a similar approach to these steps into building a function that does what you want or need it to do. You can use the same process to write a function that will do a binary search, hash table, etc.

Upvotes: 0

Francis Cugler

Reputation: 7905

The OP stated that the ordered structure is important and that the vector or array is unsorted. To the best of my knowledge there is no faster search algorithm than linear for unsorted data. Here are a few links for references:

With the above links for references; this should be enough evidence to conclude that if the data in the array or vector is unsorted and must maintain its structure, then there is but no choice to use linear iteration, it may be possible to use a hashing technique, but that can still be tricky, using binary search will only work on sorted data in most cases.

- Here is a good linear algorithm to find the Nth occurrence of T in data.

To solve your problem of finding the Nth occurrence of element T in a given unsorted array, vector or container you can use this simple function template:

It takes 3 parameters:
- a const reference to the container that is populated with data
- a const unsigned value N where N is the Nth occurrence.
- and a const template type T that you are searching for.
It returns an unsigned value for the index location within the container of the Nth occurrence of element T

template<class T>
unsigned RepititionSearch( const std::vector<T>& data, const unsigned N, const T element ) {    

    if ( data.empty() || N < 0 || N >= data.size() ) {
        return -1;
    }

    unsigned counter = 0;
    unsigned i = 0;

    for ( auto e : data ) {
        if ( element == e ) {
            ++counter;
            i++;
        } else {
            i++;
        }

        if ( counter == N ) {
            return i - 1;
        }
    }
    return -1;
}

Break down of the algorithm

It first does some sanity checks:

It checks to see if the container is empty

It checks the value N to see if it is within bounds of [0,container.size())

If any of these fail, it returns -1; in production code this might throw an exception or an error

We then have a need for 2 incrementing counters:

1 for the current index location

1 for the number of occurrences of element T

We then use a simplified for loop using c++11 or higher

We go through each e in data

We check to see if the element passed into the function is equal to the current e in data

If the check passes or is true we then pre-increment counter and post-increment i otherwise we only want to post-increment i

After incrementing the counters we then check to see if the current counter is equal to the Nth value passed into the function

If the check passes we return the value of i-1 since containers are 0 based

If the check fails here we then continue to the next iteration of the loop and repeat the process

If after all e in data has been checked and there are no occurrences of T == e or N != counter then we leave the for loop and the function returns a -1; in production code this might throw an exception or return an error.

The worst case scenario here is either there are no finds, or the Nth occurrence of T happens to be the very last e in data where this will yield O(N) which is linear, and for basic containers this should be efficient enough. If the containers have array indexing capabilities their item access should be O(1) constant if you know which index location you want.

Note: This would be the answer that I feel should solve the problem, if you are interested in a breakdown of how the design process of designing or modeling such an algorithm works you can refer to my reference answer here

AFAIK I do not think there is a better way to do this with unsorted array data, but don't quote me on it.

Upvotes: 0

Prune

Reputation: 77837

You cannot do this without at least one pass of O(N) pre-processing. From an standpoint of information theory alone, you must have knowledge of elements [0:k-1] to know whether element [k] is the one you want.

If you're going to make this search many times, then you can make a simple linear pass over the array, counting each element as you go. Store the indices in a 2-D array, so you can directly index whatever occurrence you want.

For instance, given [-1 0 1 1 -1 -1 0 0 0 -1 1], you can convert this to a 3xN array, idx

[[0 4 5 9]]
[[1 6 7 8]]
[[2 3 10]]

The Nth occurrence of element I is idx[I+1][N-1].

After that initial O(N) pass, your look-up is O(1) time, using O(N) space.

Upvotes: 10

Find index of Nth occurrence of a number using Binary Search

Answers (3)

Related Questions