Aleph0
Aleph0

Reputation: 6084

Shortest way to obtain a sublist of a sorted list of values lying in a certain interval

Today I was asking myself what might be the shortest possible code to obtain all values in a sorted vector std::vector<double>, which are greater or equal to a and smaller or equal to b.

My first approach was something like the following:

#include <vector>
#include <algorithm>
#include <iterator>
#include <iostream>

// Returns all values in sortedValues being greater equal start and smaller equal end;
std::vector<double> cutValues(const std::vector<double>& sortedValues, double start, double end) {
    std::vector<double> ret;

    auto startIter=std::lower_bound(sortedValues.begin(), sortedValues.end(), start);
    auto stopIter = std::upper_bound(sortedValues.begin(), sortedValues.end(), end);
    std::copy(startIter, stopIter, std::back_inserter(ret));
    return ret;
}

int main(int argc, char **args) {
    {
        auto ret = cutValues({ 0.1,0.2,0.3 }, 0.1, 0.3);
        std::copy(ret.begin(), ret.end(), std::ostream_iterator<double>(std::cout, ","));
        std::cout << std::endl;
    }
    {
        auto ret = cutValues({ 0.12,0.2,0.31 }, 0.1, 0.3);
        std::copy(ret.begin(), ret.end(), std::ostream_iterator<double>(std::cout, ","));
        std::cout << std::endl;
    }
    {
        auto ret = cutValues({ 0.1,0.2,0.3 }, 0.2, 0.2);
        std::copy(ret.begin(), ret.end(), std::ostream_iterator<double>(std::cout, ","));
        std::cout << std::endl;
    }
}

My second thought was something very simple like the following:

std::vector<double> cutValues2(const std::vector<double>& sortedValues, double start, double end) {
    std::vector<double> ret;
    std::copy_if(sortedValues.begin(), sortedValues.end(), std::back_inserter(ret), [&start, &end](auto v) { return v >= start && v <= end; });
    return ret;
}

But considering the case of only removing small portions from a very large vector it might have some efficiency issues.

Now I'm asking myself, if there there are even better ways to do it?

Upvotes: 8

Views: 1090

Answers (4)

Matteo Italia
Matteo Italia

Reputation: 126787

Not shorter, but generally more efficient: use std::lower_bound to find the start of the "interesting" interval, and go on copying as long as the elements are less than your maximum value; that way you are locating quickly (O(log N)) the start even for big vectors and you don't waste time doing a binary search again looking for the end of the interval - you are going to find it anyway during the copy loop.

The possibly extra compares are virtually free anyway, as double compares themselves are extremely cheap, the branch is well predicted (it's always the same until the end of the copy) and depends on data already in cache; compare with an extra binary search, which "jumps around" in memory (worse cache locality) and has less predictable branches.

std::vector<double> cutValues(const std::vector<double>& sortedValues, double minv, double maxv) {
    std::vector<double> ret;
    auto end = sortedValues.end();
    for(auto it = std::lower_bound(sortedValues.begin(), end, minv);
        it != end && *it <= max; ++it) {
        ret.push_back(*it);
    }
    return ret;
}

This can be rewritten in a similar way to your cutValues2 using STL algorithms, but I'm not sure it's much of an improvement.

std::vector<double> cutValues(const std::vector<double>& sortedValues, double minv, double maxv) {
    std::vector<double> ret;
    std::copy_if(
        std::lower_bound(sortedValues.begin(), sortedValues.end(), minv),
        sortedValues.end(),
        std::back_inserter(ret),
        [=](double v) { return v <= maxv; });
    return ret;
}

Upvotes: 2

muXXmit2X
muXXmit2X

Reputation: 2765

Once you know where your interval starts, there is no need to look through the hole vector again to find the end iterator. Start searching after the start_iterator instead

auto startIter = std::lower_bound(sortedValues.begin(), sortedValues.end(), start);
if (startIter == sortedValues.end()) {
    // either the last element was inside the interval (return it) or not (return empty vector)
    return *(--startIter) == start ? {start} : {}; 
}
auto stopIter = std::upper_bound(++startIter, sortedValues.end(), end);
                                 // start from here
std::copy(--startIter, stopIter, std::back_inserter(ret));    

Other optimizations would be content dependent. E.g. If you know that the last element inside your interval would be rather at the end of the vector it would make more sense to iterate over it backwards.

You may also consider checking the interval first to prevent unnecessary execution if a is greater or equal to b.

Upvotes: 0

DAle
DAle

Reputation: 9117

A bit changed first version:

std::vector<double> cutValues(const std::vector<double>& sortedValues, double start, double end) {
    auto startIter = std::lower_bound(sortedValues.begin(), sortedValues.end(), start);
    auto stopIter = std::upper_bound(startIter, sortedValues.end(), end);
    return std::vector<double>(startIter, stopIter);
}

Upvotes: 6

Felix Glas
Felix Glas

Reputation: 15524

You could write your function STL-style and make it work with any container of any type (and it will be more elegant imho).

template <typename ForwardIt, typename T>
auto bounded_range(ForwardIt first, ForwardIt last, T lb, T ub) {
    return std::pair{std::lower_bound(first, last, lb), std::upper_bound(first, last, ub)};
}
std::vector<double> v1{0.1, 0.1, 0.2, 0.3, 0.3, 0.5, 0.7, 0.9};

auto [beg, end] = bounded_range(std::begin(v), std::end(v), 0.3, 0.5);
std::vector<double> v2{beg, end};

// Or using 'std::make_from_tuple'.
auto v3 = std::make_from_tuple<std::vector<double>>(bounded_range(std::begin(v), std::end(v), 0.3, 0.5));

Note: examples uses C++17 template argument deduction for class templates, and structured bindings.

Upvotes: 1

Related Questions