Reputation: 5894
I am doing work to optimize a lookup on a data structure that has "almost" sorted data. I am fairly confident that the "almost" detail of it does not actually matter but am uncertain
The actual data structure is more complicated that what is necessary for SO so I have simplified it. The simplified version is std::vector<Level>
which has a Price, Bid, and Ask:
When I say generally I mean that the data has a long sequence of generally zeroes followed by meaningful values but some of the zeroes might actually be negative one. However, I would only be searching for positive values so all zeroes and negative ones are not meaningful return values
Below is the test data from my simplified program for SO:
// Price Bid Ask Index
levels.emplace_back(Level( 42.0, 0, 150)); // 0
levels.emplace_back(Level( 43.0, 0, 71)); // 1
levels.emplace_back(Level( 44.0, 0, 70)); // 2
levels.emplace_back(Level( 45.0, 0, 70)); // 3
levels.emplace_back(Level( 46.0, 0, 69)); // 4
levels.emplace_back(Level( 47.0, 0, 0)); // 5
levels.emplace_back(Level( 48.0, -1, -1)); // 6
levels.emplace_back(Level( 49.0, 0, 0)); // 7
levels.emplace_back(Level( 50.0, 80, 0)); // 8
levels.emplace_back(Level( 51.0, 81, 0)); // 9
levels.emplace_back(Level( 52.0, 81, 0)); // 10
levels.emplace_back(Level( 53.0, 82, 0)); // 11
levels.emplace_back(Level( 54.0, 201, 0)); // 12
When I search this structure for some Bid, "Seek Bid", I want to find the first Level's Price that has a Bid that is greater than or equal to "Seek Bid"
When I search this structure for some Ask, "Seek Ask", I want to find the last Level's Price that has an Ask that is greater than or equal to "Seek Ask"
Below is my simplified program for SO:
#include <algorithm>
#include <iostream>
#include <vector>
struct Level final {
Level() = delete;
Level(const double a_price, const int a_bid, const int a_ask) :
m_price(a_price),
m_bid (a_bid),
m_ask (a_ask)
{}
const double m_price;
const int m_bid;
const int m_ask;
};
int main(int argc, char** argv) {
if (argc != 3) {
std::cout << "Usage: " << argv[0] << " <Seek Bid> <Seek Ask>\n";
exit(1);
}
std::vector<Level> levels;
// Price Bid Ask Index
levels.emplace_back(Level( 42.0, 0, 150)); // 0
levels.emplace_back(Level( 43.0, 0, 71)); // 1
levels.emplace_back(Level( 44.0, 0, 70)); // 2
levels.emplace_back(Level( 45.0, 0, 70)); // 3
levels.emplace_back(Level( 46.0, 0, 69)); // 4
levels.emplace_back(Level( 47.0, 0, 0)); // 5
levels.emplace_back(Level( 48.0, -1, -1)); // 6
levels.emplace_back(Level( 49.0, 0, 0)); // 7
levels.emplace_back(Level( 50.0, 80, 0)); // 8
levels.emplace_back(Level( 51.0, 81, 0)); // 9
levels.emplace_back(Level( 52.0, 81, 0)); // 10
levels.emplace_back(Level( 53.0, 82, 0)); // 11
levels.emplace_back(Level( 54.0, 201, 0)); // 12
const int seekBid = atoi(argv[1]);
const int seekAsk = atoi(argv[2]);
std::cout << "Seek Bid: " << seekBid << ", Seek Ask: " << seekAsk << '\n';
if (seekBid <= 0 || seekAsk <= 0) {
std::cout << "Seek Bid or Seek Ask is not positive\n";
exit(1);
}
// If the last Level's Bid is < Seek Bid then what I am looking for doesn't exist
if (levels.back().m_bid < seekBid)
std::cout << "Cannot satisfy Seek Bid\n";
else {
// Find the first Level with a Bid <= Seek Bid
// Not sure why I need to specify < instead of <= but appears to work
const auto it = std::lower_bound(
levels.begin(),
levels.end(),
seekBid,
[](const Level& a_level, const int a_bid) { return a_level.m_bid < a_bid; }
);
std::cout << "Bid Price: " << it->m_price << ", Bid Index: " << &*it - &levels[0] << '\n';
}
// If the first Level's Ask is < Seek Ask then what I am looking for doesn't exist
if (levels.front().m_ask < seekAsk)
std::cout << "Cannot satisfy Seek Ask\n";
else {
// Find the last Level with Ask <= Seek Ask
// Need to use std::prev due to how std::upper_bound works
// Not sure why I need to specify < instead of <= but appears to work
const auto it = std::prev(std::upper_bound(
levels.begin(),
levels.end(),
seekAsk,
[](const int a_ask, const Level& a_level) { return a_level.m_ask < a_ask; }
));
std::cout << "Ask Price: " << it->m_price << ", Ask Index: " << &*it - &levels[0] << '\n';
}
return 0;
}
Below are some examples of running my test program for SO. The case where "Seek Bid" is 81 and "Seek Ask" is 70 is really important because there are two 81 Bids and two 70 Asks. It is important in the real program that the first 81 Bid and the last 70 Ask is found:
Seek Bid: 79, Seek Ask: 68
Bid Price: 50, Bid Index: 8
Ask Price: 46, Ask Index: 4
Seek Bid: 80, Seek Ask: 69
Bid Price: 50, Bid Index: 8
Ask Price: 46, Ask Index: 4
Seek Bid: 81, Seek Ask: 70
Bid Price: 51, Bid Index: 9
Ask Price: 45, Ask Index: 3
Seek Bid: 82, Seek Ask: 71
Bid Price: 53, Bid Index: 11
Ask Price: 43, Ask Index: 1
All of these results are correct however these are my questions:
std::lower_bound
or std::upper_bound
considering that I am only
searching for positive values? In other words, do the negative ones
cause any kind of undefined behavior given my search requirements?std::lower_bound
works on
en.cppreference.com and cplusplus.com are very confusing and I only
realized that using <
instead of <=
in my lambdas was "correct"
through trial and error. Why is it not "correct" to use <=
if I am
looking for the first / last Level that is <=
what I am searching
for?Upvotes: 3
Views: 137
Reputation: 1161
Almost all (ordered) stl containers rely on strict weak ordering. A strict weak ordering defines the relative position of elements in terms of precedence of one item over other.
Therefore, a strict weak ordering has the following properties:
If you want these STL containers and algorithms work as specified, your provided comparison has to provide this strict weak ordering.
references, more details:
https://en.cppreference.com/w/cpp/named_req/Compare
https://github.com/bashrc-real/Codearchive/blob/master/cpp/Strict_weak_ordering_and_stl.md
https://en.wikipedia.org/wiki/Weak_ordering
Upvotes: 3
Reputation: 26302
std::lower_bound
and std::upper_bound
perform simple binary search. They don't search for a particular element value, instead they search for a partition point. The range you apply std::lower_bound
to is not required to be sorted. The requirement is:
The range
[first, last)
must be partitioned with respect to the expressionelement < value
orcomp(element, value)
, i.e., all elements for which the expression istrue
must precede all elements for which the expression isfalse
.
Is it necessary for me to make all of the negative ones into zeroes before searching ...
No. Your range is always partitioned with respect to the expression element < value
if value
is positive.
Why is it not "correct" to use
<=
if I am looking for the first / last Level that is<=
what I am searching for?
Because std::lower_bound
relies on <
relation, not on <=
. Roughly speaking, it derives a <= b
from !(b < a)
.
Upvotes: 2
Reputation: 62719
The general requirement is described in Compare. There must be a single ordering such that groups of equivalent elements have a specific place in that order, using the provided comparison. lower_bound
and upper_bound
require that the input be in such an order.
Is it necessary for me to make all of the negative ones into zeroes before searching to guarantee correct results.
Not in this particular case, as it will only test Level
s against the given positive value, and not against each other. Your comp
treats 0
as equivalent to -1
, so it doesn't matter they are "out of order". It would be undefined behaviour to search for 0
or a negative number in this dataset.
Why is it not "correct" to use
<=
if I am looking for the first / last Level that is<=
what I am searching for?
Because that breaks the asymmetry requirement of a strict weak order. If you only want larger values, use upper_bound
.
Upvotes: 3