Reputation: 5152
A list of sorted integers are encoded in a uint8_t
vector.
The amount of bytes to encode these integers is arbitrary.
For example, if we use 3 bytes to encode each integer, then the first integer is encoded in the first 3 bytes, the second integer is encoded in the second 3 bytes and so on.
I would like to binary search an integer from this uint8_t
vector.
My attempt so far:
std::bsearch
seems what I want. In which, I can specify the size of each element in the array in bytes. However, to implement the comparator function, I need access to a variable that stores the size of each element in bytes. std::bsearch
is a C function, so the compiler does not allow me to access a class member variable that stores the size of each element in bytes. Global variable works, but it is ugly to store it as a global variable...
std::binary_search
takes an iterator.
I guess I need iterator to skip certain number of bytes.
I am not sure how to make this happen though.
Upvotes: 0
Views: 128
Reputation: 908
The object you want to compare is stored in several elements in the vector
. So std::binary_search
cannot be directly applied on the vector
.
One approach is applying std::bsearch
on the raw data of vector
(vecter::data()
) since std::bsearch
can group the data by specified size. So it will seem like:
std::bsearch((void*)key_array,(void*)data_vector.data(),data_vector.size()/3,3*sizeof(uint8_t),
[](const void *a, const void *b){return decode((uint8_t*)a)-decode((uint8_t*)b)});
Upvotes: 0
Reputation: 11220
The first thing that comes to mind for this is to define a custom iterator that wraps another iterator and operates on a sequence of N
bytes at a time. This would allow the wrapped iterator to be easily used in std::binary_search
without being stuck rewriting it.
Basically, the utility would have the following:
operator++
) operates on N
bytes at a time,operator--
) operates on N
bytes at a time,operator+
) advances by N * i
bytes at a time,operator-
) reports a distance of d / N
(to account for the N
bytes, andstd::int64_t
or something)Something like:
// Iterator that skips N entries over a byte type
template <std::size_t N, typename Iterator>
class byte_iterator
{
public:
// Ensure that the underlying type is 'char' (or some other 'byte' type, like std::byte)
static_assert(std::is_same<typename Iterator::value_type, unsigned char>::value, "");
using value_type = std::uint64_t;
using reference = std::uint64_t;
using pointer = value_type*;
using size_type = std::size_t;
using difference_type = std::ptrdiff_t;
using iterator_category = std::random_access_iterator_tag;
byte_iterator() = default;
explicit byte_iterator(Iterator underlying)
: m_it{underlying}
{
}
byte_iterator(const byte_iterator&) = default;
byte_iterator& operator=(const byte_iterator&) = default;
byte_iterator& operator++() {
std::advance(m_it, N);
return (*this);
}
byte_iterator operator++(int) {
auto copy = (*this);
std::advance(m_it, N);
return copy;
}
byte_iterator& operator--() {
std::advance(m_it, -static_cast<std::ptrdiff_t>(N));
return (*this);
}
byte_iterator operator--(int) {
auto copy = (*this);
std::advance(m_it, -static_cast<std::ptrdiff_t>(N));
return copy;
}
byte_iterator& operator+=(std::ptrdiff_t n) {
std::advance(m_it, static_cast<std::ptrdiff_t>(N) * n);
return (*this);
}
byte_iterator& operator-=(std::ptrdiff_t n) {
std::advance(m_it, -static_cast<std::ptrdiff_t>(N) * n);
return (*this);
}
value_type operator*() const {
// build your int here
// The question doesn't indicate endianness, so this is up to you
// this can just be a bunch of shifts / masks to create the int
}
bool operator==(const byte_iterator& other) const {
return m_it == other.m_it;
}
bool operator!=(const byte_iterator& other) const {
return m_it != other.m_it;
}
Iterator underlying() const { return m_it; }
private:
Iterator m_it;
};
template <typename N, typename Iter>
std::ptrdiff_t operator-(const byte_iterator<N, Iter>& lhs, const byte_iterator<N, Iter>& rhs) {
return (lhs.underlying() - rhs.underlying()) / 3;
}
template <typename N, typename Iter>
byte_iterator<N, Iter> operator+(const byte_iterator<N, Iter>& lhs, std::ptrdiff_t rhs) {
return byte_iterator{lhs} += rhs;
};
template <typename N, typename Iter>
byte_iterator<N, Iter> operator-(const byte_iterator<N, Iter>& lhs, std::ptrdiff_t rhs) {
return byte_iterator{lhs} -= rhs;
};
// other necessary operators...
template <std::size_t N, typename Iterator>
byte_iterator<N, std::decay_t<Iterator>> make_byte_iterator(Iterator it) {
return byte_iterator<n, std::decay_t<Iterator>>{it};
}
Note: N
doesn't have to be fixed at compile time as a template argument, it could also be done at runtime as a constructor argument. Similarly Iterator
doesn't have to be a templated type; I simply use this as an example so that it may work with any underlying iterator. Realistically you could simply make this a char*
or something so that it only works on contiguous byte arrays.
All that needs to be implemented is the reconstruction of the int for the purposes of comparison, which can be done from a series of 8-bit left shifts and masks depending on the endianness.
With an iterator wrapper like this, we can use std::lower_bound
for a simple binary search to find the iterator:
// Assuming 3-byte numbers:
const auto value = 42;
auto byte_numbers = std::vector<unsigned char>{...};
// ensure we have an increment of 3 bytes, otherwise the
// iterators will break
assert(byte_numbers.size() % 3 == 0);
auto result = std::lower_bound(
make_byte_iterator<3>(byte_numbers.begin()),
make_byte_iterator<3>(byte_numbers.end()),
value
);
The result
iterator will by a byte_iterator
that contains an underlying iterator to the start of the N-byte number you need to retrieve. If you need the underlying iterator, you can just call result.underlying()
Alternatively, this can be used in std::binary_search
to detect existence of the element in general without finding the underlying iterator (as mentioned in the question).
Note: The above code is untested -- it may have a typo or two, but the process should work as described.
Edit: Also be aware that this will only work correctly if the underlying iterator range is a multiple of N
! If it were not, then you will report the incorrect range, which will also cause advancing the iterator to potentially move past the end
iterator, which would be undefined behavior. It's important that you check the boundaries first before wrapping an iterator in this way.
Edit 2: If the N-byte integers can exceed a large value like int64_t
, but follow a simple heuristic, you can also make the value_type
of the iterator be a custom arbitrary_int
type that has a well-defined comparison operator. For example:
template <std::size_t N>
class arbitrary_int {
public:
arbitrary_int(const unsigned char* integer)
: m_int{integer}
{
}
// assuming we can lexicographically compare all bytes
bool operator==(const arbitrary_int<N>& rhs) const {
return std::equal(m_int, m_int + N, rhs.m_int, rhs.m_int + N);
}
bool operator!=(const arbitrary_int<N>& rhs) const {
return !(*this == rhs);
}
bool operator<(const arbitrary_int<N>& rhs) const {
return std::lexicographical_compare(m_int, m_int + N, rhs.m_int, rhs.m_int + N);
}
bool operator>(const arbitrary_int<N>& rhs) const {
return rhs < *this;
}
bool operator<=(const arbitrary_int<N>& rhs) const {
return !(rhs < *this);
}
bool operator>=(const arbitrary_int<N>& rhs) const {
return !(*this < *rhs);
}
private:
const unsigned char* m_int;
};
In this case, byte_iterator<N,Iterator>::operator*()
would now return
return arbitrary_int<N>{m_it.operator->()}
Or if you're using c++20,
return std::to_address(m_it);
Upvotes: 3