Reputation: 19
I have a finite array whose elements are only -1,0 or 1. I want to find the index of Nth occurrence of a number (say 0).
I can iterate through the entire array, but I'm looking for a faster approach. I can think of using Binary Search, but having trouble modelling the algorithm. How do I proceed with Binary Search in this case?
Upvotes: 1
Views: 1723
Reputation: 7905
Since you are looking to search through an array
, a vector
or some container
where the search in question pertains to the index location of some element T
based on its Nth
occurrence within its container this post may be of some help to you:
According to your question as well as some of the comments in regards to it where you explicitly stated that your container is Unsorted
while you were thinking of using a binary search
and were having trouble with the process of modeling an algorithm:
This post here serves as an example of the development process towards the design of an algorithm in which it may help you achieve what you are looking for:
The search algorithm here is a linear one, where a binary search will not be suitable to your current needs:
This same process of building an algorithm can be applied to other types of algorithms including, binary searches, hash tables, etc.
struct Index {
static unsigned counter; // Static Counter
unsigned location; // index location of Nth element
unsigned count; // How many of this element up to this point
Index() : location( 0 ), count( 0 ) {}
};
unsigned Index::counter = 0;
// These typedefs are not necessarily needed;
// just used to make reading of code easier.
typedef Index IndexZero;
typedef Index IndexPos1;
typedef Index IndexNeg1;
template<class T>
class RepititionSearch {
public:
// Some Constants to compare against: don't like "magic numbers"
const T NEG { -1 };
const T ZERO { 0 };
const T POS { 1 };
private:
std::vector<T> data_; // The actual array or vector of data to be searched
std::vector<Index> indices_; // A vector of Indexes - record keeping to prevent multiple full searches.
public:
// Instantiating a search object requires an already populated container
explicit RepititionSearch ( const std::vector<T>& data ) : data_( data ) {
// make sure indices_ is empty upon construction.
indices_.clear();
}
// method to find the Nth occurrence of object A
unsigned getNthOccurrence( unsigned NthOccurrence, T element ) {
// Simple bounds checking
if ( NthOccurrence < 0 || NthOccurrence >= data.size() ) {
// Can throw error or print message...;
return -1;
}
IndexZero zeros;
IndexPos1 ones;
IndexNeg1 negOnes;
// Clear out the indices_ so that each consecutive call is correct
indices_.clear();
unsigned idx = 0;
for ( auto e : data_ ) {
if ( element == e && element == NEG ) {
++negOnes.counter;
negOnes.location = idx;
negOnes.count = negOnes.counter;
indices_.push_back( negOnes );
}
if ( element == e && element == ZERO ) {
++zeros.counter;
zeros.location = idx;
zeros.count = zeros.counter;
indices_.push_back( zeros );
}
if ( element == e && element == POS ) {
++ones.counter;
ones.location = idx;
ones.count = ones.counter;
indices_.push_back( ones );
}
idx++;
} // for each T in data_
// Reset static counters
negOnes.counter = 0;
zeros.counter = 0;
ones.counter = 0;
// Now that we saved a record: find the nth occurance
// This will not search the full vector unless it is last element
// This has early termination. Also this vector should only be
// a percentage of the original data vector's size in elements.
for ( auto index : indices_ ) {
if ( index.count == NthOccurrence) {
// We found a match
return index.location;
}
}
// Not Found
return -1;
}
};
int main() {
// using the sample array or vector from User: Prune's answer!
std::vector<char> vec{ -1, 0, 1, 1, -1, -1, 0, 0, 0, -1, 1 };
RepititionSearch <char> search( vec );
unsigned idx = search.getNthOccurrence( 3, 1 );
std::cout << idx << std::endl;
std::cout << "\nPress any key and enter to quit." << std::endl;
char q;
std::cin >> q;
return 0;
}
// output:
10
The value of 10 is the correct answer as the 3rd occurrence of the value 1
is at location 10 in the original vector since vectors are 0
based. The vector of indices is only used as book keeping
for faster search.
If you noticed I even made this a class template to accept any basic type T
that'll be stored in std::vector<T>
as long as T
is comparable, or has operators defined for it.
AFAIK I do not think that there is any other searching method that is faster than this for the type of search that you are striving for, but don't quote me on it. However I think I can optimize this code a little more... just need some time to look at it closer.
This may appear to be a bit crazy but this does work: just a bit of fun playing around with the code
int main() {
std::cout <<
RepititionSearch<char>( std::vector<char>( { -1, 0, 1, 1, -1, -1, 0, 0, 0, -1, 1 } ) ).getNthOccurrence( 3, 1 )
<< std::endl;
}
It can be done on a single line & printed to the console without creating an instance of class.
Now this may not necessarily make the algorithm faster, but this would clean up the code a bit for readability. Here I removed the typedefs, and just by using a single version of the Index
struct in the 3 if statements you will see duplicate
code so I decided to make a private helper function for that and this is how simple the algorithm looks for clear readability.
struct Index {
unsigned location;
unsigned count;
static unsigned counter;
Index() : location(0), count(0) {}
};
unsigned Index::counter = 0;
template<class T>
class RepitiionSearch {
public:
const T NEG { -1 };
const T ZERO { 0 };
const T POS { 1 };
private:
std::vector<T> data_;
std::vector<Index> indices_;
public:
explicit RepititionSearch( const std::vector<T>& data ) : data_( data )
indices_.clear();
}
unsigned getNthOccurrence( unsigned NthOccurrence, T element ) {
if ( NthOccurrence < 0 || NthOccurrence >= data.size() ) {
return -1;
}
indices_.clear();
Index index;
unsigned i = 0;
for ( auto e : data_ ) {
if ( element == e && element == NEG ) {
addIndex( index, i );
}
if ( element == e && element == ZERO ) {
addIndex( index, i );
}
if ( element == e && element == POS ) {
addIndex( index, i );
}
i++;
}
index.counter = 0;
for ( auto idx : indices_ ) {
if ( idx.count == NthOccurrence ) {
return idx.location;
}
}
return -1;
}
private:
void addIndex( Index& index, unsigned inc ) {
++index.counter;
index.location = inc;
index.count = index.counter;
indices_.push_back( index );
}
};
And to make this completely generic to find any Nth occurrence
of any element T
the above can be simplified and reduced down to this: I also removed the static counter from Index
and moved it to the private section of RepititionSearch
, it just made more sense to place it there.
struct Index {
unsigned location;
unsigned count;
Index() : location(0), count(0) {}
};
template<class T>
class RepititionSearch {
private:
static unsigned counter_;
std::vector<T> data_;
std::vector<Index> indices_;
public:
explicit RepititionSearch( const std::vector<T>& data ) : data_( data ) {
indices_.clear();
}
unsigned getNthOccurrence( unsigned NthOccurrence, T element ) {
if ( NthOccurrence < 0 || NthOccurrence >= data_.size() ) {
return -1;
}
indices_.clear();
Index index;
unsigned i = 0;
for ( auto e : data_ ) {
if ( element == e ) {
addIndex( index, i );
}
i++;
}
counter_ = 0;
for ( auto idx : indices_ ) {
if ( idx.count == NthOccurrence ) {
return idx.location;
}
}
return -1;
}
private:
void addIndex( Index& index, unsigned inc ) {
++counter_;
index.location = inc;
index.count = counter_;
indices_.push_back( index );
}
};
template<class T>
unsigned RepititionSearch<T>::counter_ = 0;
I have also done this same algorithm above without the need or dependency of needing a vector just to hold index information. This version doesn't need the Index
struct at all and doesn't need a helper function either. It looks like this:
template<class T>
class RepititionSearch {
private:
static unsigned counter_;
std::vector<T> data_;
public:
explicit RepititionSearch( const std::vector<T>& data ) : data_( data ) {}
unsigned getNthOcc( unsigned N, T element ) {
if ( N < 0 || N >= data_.size() ) {
return -1;
}
unsigned i = 0;
for ( auto e : data_ ) {
if ( element == e ) {
++counter_;
i++;
} else {
i++;
}
if ( counter_ == N ) {
counter_ = 0;
return i-1;
}
}
counter_ = 0;
return -1;
}
};
template<class T>
unsigned RepititionSearch<T>::counter_ = 0;
Since we were able to remove the dependency of the secondary vector and removed the need for a helper function; we don't even need a class at all to hold the container; we can just write a function template that takes a vector and apply the same algorithm. Also there is no need for a static counter with this version.
template<class T>
unsigned RepititionSearch( const std::vector<T>& data, unsigned N, T element ) {
if ( data.empty() || N < 0 || N >= data.size() ) {
return -1;
}
unsigned counter = 0;
unsigned i = 0;
for ( auto e : data ) {
if ( element == e ) {
++counter;
i++;
} else {
i++;
}
if ( counter == N ) {
return i - 1;
}
}
return -1;
}
Yes this is a lot to take in; but these are the steps that are involved in the process of writing and designing an algorithm and refining it down to simpler code. As you have seen I have refined this code about 5 times. I went from using a struct
, a class
, typedefs
, and a static member
with multiple stored containers, to removing the typedefs
and putting the repeatable code into a helper function, to removing the dependency of a secondary container & the helper function, down to not even needing a class at all and just creating a function that does what it is supposed to do.
You can apply a similar approach to these steps into building a function that does what you want or need it to do. You can use the same process to write a function that will do a binary search, hash table, etc.
Upvotes: 0
Reputation: 7905
The OP stated that the ordered structure is important and that the vector
or array
is unsorted
. To the best of my knowledge there is no faster search algorithm than linear for unsorted data. Here are a few links for references:
With the above links for references; this should be enough evidence to conclude that if the data in the array
or vector
is unsorted and must maintain its structure, then there is but no choice to use linear iteration, it may be possible to use a hashing technique, but that can still be tricky, using binary search will only work on sorted
data in most cases.
Nth
occurrence of T
in data
.
To solve your problem of finding the Nth
occurrence of element T
in a given unsorted
array
, vector
or container
you can use this simple function template:
N
where N
is the Nth
occurrence.T
that you are searching for.Nth
occurrence of element T
template<class T>
unsigned RepititionSearch( const std::vector<T>& data, const unsigned N, const T element ) {
if ( data.empty() || N < 0 || N >= data.size() ) {
return -1;
}
unsigned counter = 0;
unsigned i = 0;
for ( auto e : data ) {
if ( element == e ) {
++counter;
i++;
} else {
i++;
}
if ( counter == N ) {
return i - 1;
}
}
return -1;
}
Break down of the algorithm
- It first does some sanity checks:
- It checks to see if the container is empty
- It checks the value
N
to see if it is within bounds of[0,container.size())
- If any of these fail, it returns
-1
; in production code this might throw an exception or an error- We then have a need for 2 incrementing counters:
- 1 for the current index location
- 1 for the number of occurrences of element
T
- We then use a simplified for loop using
c++11
or higher
- We go through each
e
indata
- We check to see if the
element
passed into the function isequal to
the currente
indata
- If the check passes or is true we then
pre-increment
counter
andpost-increment
i
otherwise we only want topost-increment
i
- After incrementing the counters we then check to see if the current
counter
is equal to theNth
value passed into the function- If the check passes we return the value of
i-1
since containers are0
based- If the check fails here we then continue to the next iteration of the loop and repeat the process
- If after all
e
indata
has been checked and there are no occurrences ofT == e
orN != counter
then we leave the for loop and the function returns a-1
; in production code this might throw an exception or return an error.
The worst case scenario here is either there are no finds, or the Nth
occurrence of T
happens to be the very last e
in data
where this will yield O(N)
which is linear, and for basic containers this should be efficient enough. If the containers have array indexing
capabilities their item access should be O(1)
constant if you know which index location you want.
Note: This would be the answer that I feel should solve the problem, if you are interested in a breakdown of how the design process of designing or modeling such an algorithm works you can refer to my reference answer
here
AFAIK I do not think there is a better
way to do this with unsorted array data
, but don't quote me on it.
Upvotes: 0
Reputation: 77837
You cannot do this without at least one pass of O(N) pre-processing. From an standpoint of information theory alone, you must have knowledge of elements [0:k-1] to know whether element [k] is the one you want.
If you're going to make this search many times, then you can make a simple linear pass over the array, counting each element as you go. Store the indices in a 2-D array, so you can directly index whatever occurrence you want.
For instance, given [-1 0 1 1 -1 -1 0 0 0 -1 1], you can convert this to a 3xN array, idx
[[0 4 5 9]]
[[1 6 7 8]]
[[2 3 10]]
The Nth occurrence of element I is idx[I+1][N-1]
.
After that initial O(N) pass, your look-up is O(1) time, using O(N) space.
Upvotes: 10