asimes
asimes

Reputation: 5894

Questions about std::lower_bound and std::upper_bound

I am doing work to optimize a lookup on a data structure that has "almost" sorted data. I am fairly confident that the "almost" detail of it does not actually matter but am uncertain

The actual data structure is more complicated that what is necessary for SO so I have simplified it. The simplified version is std::vector<Level> which has a Price, Bid, and Ask:

When I say generally I mean that the data has a long sequence of generally zeroes followed by meaningful values but some of the zeroes might actually be negative one. However, I would only be searching for positive values so all zeroes and negative ones are not meaningful return values

Below is the test data from my simplified program for SO:

//                        Price  Bid  Ask    Index
levels.emplace_back(Level( 42.0,   0, 150)); //  0
levels.emplace_back(Level( 43.0,   0,  71)); //  1
levels.emplace_back(Level( 44.0,   0,  70)); //  2
levels.emplace_back(Level( 45.0,   0,  70)); //  3
levels.emplace_back(Level( 46.0,   0,  69)); //  4
levels.emplace_back(Level( 47.0,   0,   0)); //  5
levels.emplace_back(Level( 48.0,  -1,  -1)); //  6
levels.emplace_back(Level( 49.0,   0,   0)); //  7
levels.emplace_back(Level( 50.0,  80,   0)); //  8
levels.emplace_back(Level( 51.0,  81,   0)); //  9
levels.emplace_back(Level( 52.0,  81,   0)); // 10
levels.emplace_back(Level( 53.0,  82,   0)); // 11
levels.emplace_back(Level( 54.0, 201,   0)); // 12

When I search this structure for some Bid, "Seek Bid", I want to find the first Level's Price that has a Bid that is greater than or equal to "Seek Bid"

When I search this structure for some Ask, "Seek Ask", I want to find the last Level's Price that has an Ask that is greater than or equal to "Seek Ask"

Below is my simplified program for SO:

#include <algorithm>
#include <iostream>
#include <vector>

struct Level final {
    Level() = delete;
    Level(const double a_price, const int a_bid, const int a_ask) :
        m_price(a_price),
        m_bid  (a_bid),
        m_ask  (a_ask)
    {}

    const double m_price;
    const int    m_bid;
    const int    m_ask;
};

int main(int argc, char** argv) {
    if (argc != 3) {
        std::cout << "Usage: " << argv[0] << " <Seek Bid> <Seek Ask>\n";
        exit(1);
    }

    std::vector<Level> levels;

    //                        Price  Bid  Ask    Index
    levels.emplace_back(Level( 42.0,   0, 150)); //  0
    levels.emplace_back(Level( 43.0,   0,  71)); //  1
    levels.emplace_back(Level( 44.0,   0,  70)); //  2
    levels.emplace_back(Level( 45.0,   0,  70)); //  3
    levels.emplace_back(Level( 46.0,   0,  69)); //  4
    levels.emplace_back(Level( 47.0,   0,   0)); //  5
    levels.emplace_back(Level( 48.0,  -1,  -1)); //  6
    levels.emplace_back(Level( 49.0,   0,   0)); //  7
    levels.emplace_back(Level( 50.0,  80,   0)); //  8
    levels.emplace_back(Level( 51.0,  81,   0)); //  9
    levels.emplace_back(Level( 52.0,  81,   0)); // 10
    levels.emplace_back(Level( 53.0,  82,   0)); // 11
    levels.emplace_back(Level( 54.0, 201,   0)); // 12

    const int seekBid = atoi(argv[1]);
    const int seekAsk = atoi(argv[2]);
    std::cout << "Seek Bid: " << seekBid << ", Seek Ask: " << seekAsk << '\n';

    if (seekBid <= 0 || seekAsk <= 0) {
        std::cout << "Seek Bid or Seek Ask is not positive\n";
        exit(1);
    }

    // If the last Level's Bid is < Seek Bid then what I am looking for doesn't exist
    if (levels.back().m_bid < seekBid)
        std::cout << "Cannot satisfy Seek Bid\n";
    else {
        // Find the first Level with a Bid <= Seek Bid
        // Not sure why I need to specify < instead of <= but appears to work
        const auto it = std::lower_bound(
            levels.begin(),
            levels.end(),
            seekBid,
            [](const Level& a_level, const int a_bid) { return a_level.m_bid < a_bid; }
        );
        std::cout << "Bid Price: " << it->m_price << ", Bid Index: " << &*it - &levels[0] << '\n';
    }

    // If the first Level's Ask is < Seek Ask then what I am looking for doesn't exist
    if (levels.front().m_ask < seekAsk)
        std::cout << "Cannot satisfy Seek Ask\n";
    else {
        // Find the last Level with Ask <= Seek Ask
        // Need to use std::prev due to how std::upper_bound works
        // Not sure why I need to specify < instead of <= but appears to work
        const auto it = std::prev(std::upper_bound(
            levels.begin(),
            levels.end(),
            seekAsk,
            [](const int a_ask, const Level& a_level) { return a_level.m_ask < a_ask; }
        ));
        std::cout << "Ask Price: " << it->m_price << ", Ask Index: " << &*it - &levels[0] << '\n';
    }

    return 0;
}

Below are some examples of running my test program for SO. The case where "Seek Bid" is 81 and "Seek Ask" is 70 is really important because there are two 81 Bids and two 70 Asks. It is important in the real program that the first 81 Bid and the last 70 Ask is found:

Seek Bid: 79, Seek Ask: 68
Bid Price: 50, Bid Index: 8
Ask Price: 46, Ask Index: 4

Seek Bid: 80, Seek Ask: 69
Bid Price: 50, Bid Index: 8
Ask Price: 46, Ask Index: 4

Seek Bid: 81, Seek Ask: 70
Bid Price: 51, Bid Index: 9
Ask Price: 45, Ask Index: 3

Seek Bid: 82, Seek Ask: 71
Bid Price: 53, Bid Index: 11
Ask Price: 43, Ask Index: 1

All of these results are correct however these are my questions:

Upvotes: 3

Views: 137

Answers (3)

skeller
skeller

Reputation: 1161

Almost all (ordered) stl containers rely on strict weak ordering. A strict weak ordering defines the relative position of elements in terms of precedence of one item over other.

Therefore, a strict weak ordering has the following properties:

  • For all x in S, it is not the case that x < x (irreflexivity).
  • For all x, y in S, if x < y then it is not the case that y < x (asymmetry).
  • For all x, y, z in S, if x < y and y < z then x < z (transitivity).
  • For all x, y, z in S, if x is incomparable with y (neither x < y nor y < x hold), and y is incomparable with z, then x is incomparable with z (transitivity of incomparability).

If you want these STL containers and algorithms work as specified, your provided comparison has to provide this strict weak ordering.

references, more details:

https://en.cppreference.com/w/cpp/named_req/Compare

https://github.com/bashrc-real/Codearchive/blob/master/cpp/Strict_weak_ordering_and_stl.md

https://en.wikipedia.org/wiki/Weak_ordering

Upvotes: 3

Evg
Evg

Reputation: 26302

std::lower_bound and std::upper_bound perform simple binary search. They don't search for a particular element value, instead they search for a partition point. The range you apply std::lower_bound to is not required to be sorted. The requirement is:

The range [first, last) must be partitioned with respect to the expression element < value or comp(element, value), i.e., all elements for which the expression is true must precede all elements for which the expression is false.


Is it necessary for me to make all of the negative ones into zeroes before searching ...

No. Your range is always partitioned with respect to the expression element < value if value is positive.

Why is it not "correct" to use <= if I am looking for the first / last Level that is <= what I am searching for?

Because std::lower_bound relies on < relation, not on <=. Roughly speaking, it derives a <= b from !(b < a).

Upvotes: 2

Caleth
Caleth

Reputation: 62719

The general requirement is described in Compare. There must be a single ordering such that groups of equivalent elements have a specific place in that order, using the provided comparison. lower_bound and upper_bound require that the input be in such an order.

Is it necessary for me to make all of the negative ones into zeroes before searching to guarantee correct results.

Not in this particular case, as it will only test Levels against the given positive value, and not against each other. Your comp treats 0 as equivalent to -1, so it doesn't matter they are "out of order". It would be undefined behaviour to search for 0 or a negative number in this dataset.

Why is it not "correct" to use <= if I am looking for the first / last Level that is <= what I am searching for?

Because that breaks the asymmetry requirement of a strict weak order. If you only want larger values, use upper_bound.

Upvotes: 3

Related Questions