Reputation: 3272
I want to simplify the work with indexes using STL algorithms. Specifically, I have an array of indexes to another container in order, so that the indexes are sorted with respect to the corresponding values in the other container.
I want to make work with data through indexes transparent for programmer. For example, I want to use std::equal_range to find a range in data using indexes without writing a predicate for this. The predicate with only two purposes: to store the begin()
of data container and to provide code for indirect access.
I feel like I am reinventing the bicycle and STL should have something to solve this task, like it have predicates for std::less, std::greater, std::negate, etc. I expect that somebody answers “Just use predicate std::indirect_array and std::indirect_binary_predicate in this way” or something alike. I would prefer the solution which would work with usual std::vector containers.
What I want to avoid
In case I need to use STL algorithms on the indexed data, e.g. an array data with values and an array indexes with indexes, the algorithm refers to indexed data using both data container begin()
and index.
I prefer to keep the indexes instead of pointers/iterators mainly because I want to reduce memory footprint. For Winx64 platform each int index in takes 4 bytes, while a pointer takes 8 bytes. Additionally, indexes usage simplifies arrays relocation, fit for work with structures of arrays (SoA) and some other operations.
The problem breaks down into two:
I really can’t decouple these two, if somebody could, please provide the way to. The main reason is that when I would ask where and how to store the container begin()
people will point out to the predicate with internal state which is exactly the second part of the question.
When I talk about the transparency, I mean that I want to use algorithms like std::equal_range to find ranges in values, not in indexes, but using the indexes as an entry point. Unfortunately, std::equal_range knows nothing of my intentions and will compare just indexes. Of course, I could handle this in comparator predicate, but this will force me to write it and, in every predicate, remember about the indirection.
Saving memory, I have to pay back, keeping somewhere or providing the begin()
of the container.
So, most likely, I have to pass the begin()
somewhere to algorithms and the best place for this is the predicate:
struct Value {
int data;
public:
Value(int value) : data(value) {}
operator int() const { return data; }
bool operator < (const Value& rhs) const { return data < int(rhs); }
};
struct Index {
public:
int index;
public:
Index(int i) : index(i) {}
};
class Comparator {
const std::vector<Value>& vec;
public:
Comparator(const std::vector<Value>& vec) : vec(vec) {}
bool operator () (const Value& lhs, const Index& rhs) const {
return lhs < vec[rhs.index];
}
bool operator () (const Index& lhs, const Value& rhs) const {
return vec[lhs.index] < rhs;
}
};
int main()
{
std::vector<Value> v1 = { 0,30,20,40,10 };
std::vector<Index> i1 = { 0,4,2,1,3 };
Value value(30);
auto range_pred = std::equal_range(i1.begin(), i1.end(), value, Comparator(v1));
std::cout << std::endl << "With predicate:" << std::endl;
std::cout << "*range_pred.first = " << v1[(*range_pred.first).index] << std::endl;
std::cout << "*range_pred.second= " << v1[(*range_pred.second).index] << std::endl;
std::cout << std::endl << "range_pred.second-range_pred.first " << std::distance(range_pred.first, range_pred.second) << std::endl;
}
The key problems are:
The key questions
begin()
? Is the predicate with state the best solution?std::indirect
(fictional) could forget about this forever and work with indexes like with values.With the goals
I am sure that in modern C++/STL should be some ready solution.
I have dozens of indexes, dozens of predicates and many levels of indirection and want to simplify the code since in many places it becomes complex and at the same time, similar, although not identical. Most of it is indirection code.
Upvotes: 1
Views: 430
Reputation: 4896
Not entirely sure this is what you expected, but normally indexing is achieved via projection, as @Jarod42 points out in the comments on the question. You can also use a custom view. Both approaches are shown below, with two closely-related variants shown for the custom view approach.
#include <vector>
#include <concepts>
#include <ranges>
#include <algorithm>
#include <iostream>
struct Value {
int data;
public:
Value(int value) : data(value) {}
operator int() const { return data; }
bool operator < (const Value& rhs) const { return data < int(rhs); }
};
struct Index {
public:
int index;
public:
Index(int i) : index(i) {}
};
// Reusable generic for the standard approach using projections
// Added following @Jarod42's comment on this answer
template<std::ranges::random_access_range R>
auto create_index_projection(R&& range) {
return [&range](const Index& i){
return std::ranges::begin(range)[i.index];
};
}
// Reusable generic for the custom view approach
template<std::ranges::random_access_range R>
auto indexing(R&& range){
return std::views::transform([&range](const Index& i) -> decltype(auto) {
return std::ranges::begin(range)[i.index];
});
};
// Variant of the custom view approach, with the operands swapped
// Requires more work, but the order may be more "natural" and easier for composition
template<std::ranges::random_access_range I>
requires std::same_as<std::ranges::range_value_t<I>, Index>
struct IndexObject {
I&& indices;
explicit IndexObject(I&& indices_):indices{std::forward<I>(indices_)}{}
};
template <typename I>
IndexObject<I> indexedby(I&& indices_){
return IndexObject<I>{std::forward<I>(indices_)};
}
template <std::ranges::random_access_range R, std::ranges::random_access_range I>
auto operator | (R&& range, IndexObject<I> index_obj){
return std::forward<I>(index_obj.indices) | indexing(std::forward<R>(range));
}
int main()
{
std::vector<Value> v1 = { 0,30,20,40,10 };
std::vector<Index> i1 = { 0,4,2,1,3 };
Value value(30);
//Standard approach: Using projection
auto range_pred1 = std::ranges::equal_range(i1, value, {}, create_index_projection(v1));
std::cout << "*range_pred.first = " << v1[(*range_pred1.begin()).index].data << std::endl;
std::cout << "*range_pred.second= " << v1[(*range_pred1.end()).index].data << std::endl;
std::cout << std::endl << "range_pred.second-range_pred.first " << std::distance(range_pred1.begin(), range_pred1.end()) << std::endl;
//Alternatve approach: Using custom view
//May be the syntax you are looking for
auto indexing_view = i1 | indexing(v1);
auto range_pred2 = std::ranges::equal_range(indexing_view, value);
std::cout << "*range_pred.first = " << (*range_pred2.begin()).data << std::endl;
std::cout << "*range_pred.second= " << (*range_pred2.end()).data << std::endl;
std::cout << std::endl << "range_pred.second-range_pred.first " << std::distance(range_pred2.begin(), range_pred2.end()) << std::endl;
//Variant of above, with more "natural" order in syntax
auto indexing_view2 = v1 | indexedby(i1);
auto range_pred3 = std::ranges::equal_range(indexing_view2, value);
std::cout << "*range_pred.first = " << (*range_pred3.begin()).data << std::endl;
std::cout << "*range_pred.second= " << (*range_pred3.end()).data << std::endl;
std::cout << std::endl << "range_pred.second-range_pred.first " << std::distance(range_pred3.begin(), range_pred3.end()) << std::endl;
//Also works for rvalues provided that std::ranges::dangling is not returned by the algorithm
bool binary_search_result1 = std::ranges::binary_search(std::vector<Value>{0,20,10}|indexedby(std::vector<Index>{0,2,1}), Value{10});
bool binary_search_result2 = std::ranges::binary_search(std::vector<Value>{0,20,10}|indexedby(std::vector<Index>{0,2,1}), Value{15});
std::cout << std::boolalpha;
std::cout << std::endl << "binary search for 10 " << binary_search_result1 << std::endl;
std::cout << "binary search for 15 " << binary_search_result2 << std::endl;
}
Demo: https://godbolt.org/z/389Y99Mnz
The adaptors indexing
and indexedby
are aimed at code deduplication. You only need to define once, and then use either index_range | indexing(indexed_range)
or indexed_range | indexedby(index_range)
to obtain a view which you can almost use as a reordered range. Only one of the two variants need to be defined so you can choose whichever syntax that you find the most natural. This addresses the transparent requirement in your question. So let's say you have the following
std::string s="abc";
std::array<Index> indices[]{1,2,0};
auto sView = s | indexedby(indices);
//Or, equivalently
//auto sView = indices | indexing(s);
Note that now indices
and s
are of completely different types from above, but with the same definition of the adaptors sView
can be used as a random access range containing "bca". It is random access because the underlying range indices
is also so. That means for example sView[2]=='a'
and for(const auto c: sView){...}
does what you expect them to do (the latter does not really require a random access range though), which is good enough for most applications.
It is also composable. So for example if you have another level of indirection using a index vector i2
, in addition to i1
and v1
, then you can just use the view auto view = v1 | indexedby(i1) | indexedby(i2)
to access the elements transparently. Due to the restriction that overloaded []
must be a member function, a syntax like v1[indexedby(i1)][indexedby(i2)]
is not possible.
Some more explanation about the definition of the IndexObject
template: it is a thin wrapper (the extra cost in most C++ implementations being the memory of a single pointer) around a range of Index
(e.g. std::vector<Index>
in your example), and indexedby
is actually just a more descriptive name for the make
function of an IndexObject
, it can just be called makeIndexObject
just like make_pair
is to std::pair
.
Now, this thin wrapper around the original range of Index
makes sure that the correct operator |
is picked up for the syntax v1 | indexedby(i1)
. It could have just been v1 | i1
, by just changing the signature of operator|
, but I have decided that that is too error-prone and does not convey the intention well, so I have chosen to make this extra wrapper.
So to summarize its purpose in a sentence: it is to wrap (capture) an existing range of Index
to make the syntax v1 | indexedby(i1)
possible. In the implementation of the corresponding operator |
of many range adaptors, this is called a closure.
For the "how it works" part. Most of the information are already presented above, but I can rephrase it in a way corresponding to the natural program flow.
indexedby
wraps (captures) a range of Index
, which is an IndexObject
.std::transform_view
.std::transform_view
can then be used for accessing transparently.Conceptually, it is not much different from (Capture(i1))(v1) -> std::transform_view<decltype(i1), F>
.
At this point it may be tempting to get rid of the classes Value
and Index
entirely, since the original intent is preventing the misuse of an index vector as a value one. Doing so is in fact perfectly safe in terms of type safety. The reason is as follows.
For the indexedby
adaptor, since a range cannot be converted to an IndexObject
thanks to the explicit
constructor, and an IndexObject
is not a valid range itself, it is now crystal clear that for the values you have to use a real range and for indices you have to use an IndexObject
. There's no way they can get mixed up in the type system. Similarly for the indexing
adaptor, for the values you have to use a range adaptor and for the indices you have to use a real range, and of course the standard library doesn't allow implicit conversions between them.
The following snippet from this demo clearly shows the type safety. For the definitions of the types involved in the demo, refer to the link to the demo itself. The demo is adapted from another demo provided by OP.
namespace TypeSafetyDemo{
namespace Approach1{
using Value = ValueObject<std::vector<int>&>;
using Index = std::vector<int>;
using F = void(*)(Value, Index);
static_assert(std::invocable<F, Value, Index>);
static_assert(!std::invocable<F, Value, Value>);
static_assert(!std::invocable<F, Index, Value>);
static_assert(!std::invocable<F, Index, Index>);
}
namespace Approach2{
using Value = std::vector<int>;
using Index = IndexObject<std::vector<int>&>;
using F = void(*)(Value, Index);
static_assert(std::invocable<F, Value, Index>);
static_assert(!std::invocable<F, Value, Value>);
static_assert(!std::invocable<F, Index, Value>);
static_assert(!std::invocable<F, Index, Index>);
}
}
An additional note is that since both adaptors capture the underlying range by reference, the usual warns about lifetime and reference members apply. Don't store an indexedby(i1)
somewhere and use it after the lifetime of i1
ends. For a similar reason, having a function signature of f(std::vector<int>, IndexObject<std::vector<int>&&> index_obj)
is dangerous and not very useful, as the underlying vector
will be moved to the owning_view
in the first use of index_obj
, and subsequent uses of index_obj
will be invalid. I've also modified the template IndexObject
to make IndexObject<std::vector<int>>
ill-formed to prevent misuses similar to this.
If you really want to make f(std::vector<int>, IndexObject<std::vector<int>&&>)
useful and safe, here provides an alternative. It only moves the underlying vector
if both that vector
and the IndexObject
itself are rvalues.
About the suggestion from a now deleted comment: R range
with random_access_range R
does not guarantee range[i.index]
is well-formed. In fact the concept does not say much about the member functions of R
itself, only about its iterators. So in order to be general enough it is still necessary to use std::ranges::begin(range)[i.index]
. Although all random access ranges in the standard library do provide operator[]
.
Upvotes: 3