pseyfert
pseyfert

Reputation: 3613

lower_bound for more than one value in c++

Suppose I have a sorted vector of numbers from 0 to 1. I want to know the indices, where values become larger than multiples of 0.1 (i.e. the deciles. in the future maybe also percentiles).

A simple solution I have in mind is using std::lower_bound:

std::vector<float> v;

/// something which fills the vector here

std::sort(v.begin(),v.end());

std::vector<float>::iterator i = v.begin();
for (float k = 0.1 ; k < 0.99 ; k+= 0.1) {
  i = std::lower_bound (v.begin(), v.end(), k);
  std::cout << "reached " << k << " at position " << (low-v.begin()) << std::endl;
  std::cout << "       going from " << *(low-1) << " to " << *low << std::endl;
  // for simplicity of the example, I don't check if low is the first item of the vector
}

Since the vector can be long, I was wondering if this can be made faster. A first optimisation is to not search the part of the vector below the previous decile:

i = std::lower_bound (i, v.end(), k);

But, assuming lower_bound performs a binary search, this still scans the entire upper part of the vector for each decile over and over again and doesn't use the intermediate results from the previous binary search.

So ideally I would like to use a search function to which I can pass multiple search items, somehow like:

float searchvalues[9];
for (int k = 1; k <= 9 ; ++k) {
  searchvalues[k] = ((float)k)/10.;
}
int deciles[9] = FANCY_SEARCH(v.begin(),v.end(),searchvalues,9);

is there anything like this already around and existing in standard, boost, or other libraries?

Upvotes: 0

Views: 689

Answers (2)

Jarod42
Jarod42

Reputation: 217810

To be in O(log n), you may use the following:

void fill_range(
    std::array<boost::optional<std::pair<std::size_t, std::size_t>>, 10u>& ranges,
    const std::vector<float>& v,
    std::size_t b,
    std::size_t e)
{
    if (b == e) {
        return;
    }
    int decile_b = v[b] / 0.1f;
    int decile_e = v[e - 1] / 0.1f;

    if (decile_b == decile_e) {
        auto& range = ranges[decile_b];

        if (range) {
            range->first = std::min(range->first, b);
            range->second = std::max(range->second, e);
        } else {
            range = std::make_pair(b, e);
        }
    } else {
        std::size_t mid = (b + e + 1) / 2;
        fill_range(ranges, v, b, mid);
        fill_range(ranges, v, mid, e);
    }
}

std::array<boost::optional<std::pair<std::size_t, std::size_t>>, 10u>
decile_ranges(const std::vector<float>& v)
{
    // assume sorted `v` with value x: 0 <= x < 1
    std::array<boost::optional<std::pair<std::size_t, std::size_t>>, 10u> res;
    fill_range(res, v, 0, v.size());
    return res;
}

Live Demo

but a linear search seems simpler

auto last = v.begin();
for (int i = 0; i != 10; ++i) {
    const auto it = std::find_if(v.begin(), v.end(),
                                 [i](float f) {return f >= (i + 1) * 0.1f;});
    // ith decile ranges from `last` to `it`;

    last = it;
}

Upvotes: 0

Bathsheba
Bathsheba

Reputation: 234785

There isn't anything in Boost or the C++ Standard Library. Two choices for an algorithm, bearing in mind that both vectors are sorted:

  1. O(N): trundle through the sorted vector, considering the elements of your quantile vector as you go.

  2. O(Log N * Log M): Start with the middle quantile. Call lower_bound. The result of this becomes the higher iterator in a subsequent lower_bound call on the set of quantiles below that pivot and the lower iterator in a subsequent lower_bound call on the set of quantiles above that pivot. Repeat the process for both halves.

For percentiles, my feeling is that (1) will be the faster choice, and is considerably simpler to implement.

Upvotes: 0

Related Questions