Hunar
Hunar

Reputation: 349

C++ efficient finding of the first nearest matching value in a vector?

Given the unsorted vector {6.0, 3.02, 4.2, 5.3} and given a threshold of 0.1, how can I efficiently find the first match to the value 3 (for example) within the given threshold in C++ ? My current implementation is as follows but it is of complexity O(n). I want to improve that to O(log n) if possible. Thanks a lot in advance

std::vector<double> array = {6.0, 3.02, 4.2, 5.3};  
double val = 3 // the to be found value within the array above
double thresh = 0.1; // max threshold of the matching value
double found; // the matching value
for (int i = 0; i < array.size(); i++){
    if ( abs(array[i] - val) < thresh){
        found = array[i];
    }
}

The output should be 3.02 because it is the first closest match to 3 in the given array within the allowed threshold of 0.1

EDIT: If I can afford sorting the vector upfront, how can I re-implement the above search to be O(log n)? Thanks

Upvotes: 1

Views: 1842

Answers (3)

Marcelino
Marcelino

Reputation: 331

I think it cant be done. The best you can improve the search in a sorted array is O(log(n)) using binary search. But in an unsorted array you eventually must go through all the array items, and this is O(n)

Upvotes: 0

Francisco Geiman
Francisco Geiman

Reputation: 400

As sad by others, you can't do better than O(n) search without sorting the array.

If we sort the array first, we can do binary search and adopt a new strategy.

We need to find out which one is the first value in the array that satisfies (array[pos] >= (value - threshold) ). If we can find such value, then we check if it is inside the range [value - threshold, value + threshold]. If it is we return it, otherwise we don't.

Below is how I would implement with the sorting, using C++.

#include <vector>
#include <algorithm>
#include <math.h>
#include <limits>
#include <iostream>
#include <iterator>

double binarySearch(std::vector<double>& array, const double value, const double threshold) 
{
    // I assume here that the array is sorted ...
    // If I cannot find it, I will return infinity (:

    double returnValue = std::numeric_limits<double>::infinity();

    std::vector<double>::iterator it = std::lower_bound(array.begin(), array.end(), value - threshold);

    if(it != array.end() ) 
    {
        if(fabs(*it - value) <= threshold ) returnValue = *it;
    }

    return returnValue;
}



int main() 
{
    std::vector<double> array = {6.0, 3.02, 4.2, 5.3};    
    double val = 3.0;
    double threshold = 0.1;

    // Sorting the array
    std::sort(array.begin(), array.end() );
    double res = binarySearch(array, val, threshold);

    if(res != std::numeric_limits<double>::infinity() )
    {
        std::cout << res << std::endl;
    }
    else std::cout << "Desired value not found" << std::endl;

    return 0;
}

Upvotes: 1

CodeMouse92
CodeMouse92

Reputation: 6898

You're performing a linear search, which is definitely O(n). However, that is unfortunately the fastest search algorithm for an unsorted array/vector.

Therefore, to get something faster, you're going to need to first sort the vector. Do this up front, one time, or your resulting code will actually be slower than linear search. std::sort() is reasonably efficient - although there are a handful of faster sorting algorithms, if you care to find one. Be sure you're actually storing the sorted vector, either in place, or in a new variable, depending on your needs. You do not want to have to sort the data more than once.

Then, you can use a binary search algorithm to locate the value. std::lower_bound or std::upper_bound will probably suit your needs (thanks Eric for that note). Otherwise, if you're using standard binary search, even if the exact match isn't found, it'll put you in the ballpark where you are looking at two or three values, one of which is definitely your match.

Now, as Eric pointed out in comments, sorting does cost more than a linear search, so if you're only searching that data set once, you already have the most efficient approach.


EDIT: In comments, the OP described needing to add new data to the vector on occasion. This is a fairly simple issue to resolve: simply use a binary search to find where the new value belongs in the sorted vector, and insert it there.

Upvotes: 5

Related Questions