Transparent indexes for STL algorithms in C++

Question

I want to simplify the work with indexes using STL algorithms. Specifically, I have an array of indexes to another container in order, so that the indexes are sorted with respect to the corresponding values in the other container.

I want to make work with data through indexes transparent for programmer. For example, I want to use std::equal_range to find a range in data using indexes without writing a predicate for this. The predicate with only two purposes: to store the begin() of data container and to provide code for indirect access.

I feel like I am reinventing the bicycle and STL should have something to solve this task, like it have predicates for std::less, std::greater, std::negate, etc. I expect that somebody answers “Just use predicate std::indirect_array and std::indirect_binary_predicate in this way” or something alike. I would prefer the solution which would work with usual std::vector containers.

What I want to avoid

In case I need to use STL algorithms on the indexed data, e.g. an array data with values and an array indexes with indexes, the algorithm refers to indexed data using both data container begin() and index.

I prefer to keep the indexes instead of pointers/iterators mainly because I want to reduce memory footprint. For Winx64 platform each int index in takes 4 bytes, while a pointer takes 8 bytes. Additionally, indexes usage simplifies arrays relocation, fit for work with structures of arrays (SoA) and some other operations.

The problem breaks down into two:

Representation of the indexes
Providing transparence for algorithms

I really can’t decouple these two, if somebody could, please provide the way to. The main reason is that when I would ask where and how to store the container begin() people will point out to the predicate with internal state which is exactly the second part of the question.

When I talk about the transparency, I mean that I want to use algorithms like std::equal_range to find ranges in values, not in indexes, but using the indexes as an entry point. Unfortunately, std::equal_range knows nothing of my intentions and will compare just indexes. Of course, I could handle this in comparator predicate, but this will force me to write it and, in every predicate, remember about the indirection.

Saving memory, I have to pay back, keeping somewhere or providing the begin() of the container.

So, most likely, I have to pass the begin() somewhere to algorithms and the best place for this is the predicate:

struct Value {
    int data;
public:
    Value(int value) : data(value) {}
    operator int() const { return data; }

    bool operator < (const Value& rhs) const { return data < int(rhs); }
};

struct Index {
public:
    int index;
public:
    Index(int i) : index(i) {}
};

class Comparator {
    const std::vector& vec;
public:
    Comparator(const std::vector& vec) : vec(vec) {}

    bool operator () (const Value& lhs, const Index& rhs) const {
        return lhs < vec[rhs.index];
    }
    bool operator () (const Index& lhs, const Value& rhs) const {
        return vec[lhs.index] < rhs;
    }
};


int main()
{
    std::vector v1 = { 0,30,20,40,10 };
    std::vector i1 = { 0,4,2,1,3 };

    Value value(30);

    auto range_pred = std::equal_range(i1.begin(), i1.end(), value, Comparator(v1));

    std::cout << std::endl << "With predicate:" << std::endl;
    std::cout << "*range_pred.first = " << v1[(*range_pred.first).index] << std::endl;
    std::cout << "*range_pred.second= " << v1[(*range_pred.second).index] << std::endl;

    std::cout << std::endl << "range_pred.second-range_pred.first " << std::distance(range_pred.first, range_pred.second) << std::endl;
}

The key problems are:

I have to provide the predicate every time, even for simple indirection, because it is the only place to store the reference to data vector.
If the user would forget to add predicate, the error messages will be killing.

The key questions

Can this be done simpler in modern C++?
Where is the best place to store the containers begin()? Is the predicate with state the best solution?
How to make this transparent, so that the developer using somewhere something like std::indirect (fictional) could forget about this forever and work with indexes like with values.

With the goals

Robustness: when a user could forget to provide the predicate and start getting crazy amount of STL errors, this doesn’t work; when some implicit conversions could ruin indirection, this doesn’t work, etc.
Readability and avoiding code duplication.
Memory footprint with an eye on performance.

I am sure that in modern C++/STL should be some ready solution.

I have dozens of indexes, dozens of predicates and many levels of indirection and want to simplify the code since in many places it becomes complex and at the same time, similar, although not identical. Most of it is indirection code.

Weijun Zhou · Accepted Answer

Not entirely sure this is what you expected, but normally indexing is achieved via projection, as @Jarod42 points out in the comments on the question. You can also use a custom view. Both approaches are shown below, with two closely-related variants shown for the custom view approach.

#include 
#include 
#include 
#include 
#include 

struct Value {
    int data;
public:
    Value(int value) : data(value) {}
    operator int() const { return data; }

    bool operator < (const Value& rhs) const { return data < int(rhs); }
};

struct Index {
public:
    int index;
public:
    Index(int i) : index(i) {}
};

// Reusable generic for the standard approach using projections
// Added following @Jarod42's comment on this answer
template
auto create_index_projection(R&& range) { 
    return [&range](const Index& i){
        return std::ranges::begin(range)[i.index]; 
    }; 
}

// Reusable generic for the custom view approach
template
auto indexing(R&& range){
    return std::views::transform([&range](const Index& i) -> decltype(auto) {
        return std::ranges::begin(range)[i.index];
    });
};

// Variant of the custom view approach, with the operands swapped
// Requires more work, but the order may be more "natural" and easier for composition
template
requires std::same_as, Index>
struct IndexObject {
    I&& indices;
    explicit IndexObject(I&& indices_):indices{std::forward(indices_)}{}
};

template 
IndexObject indexedby(I&& indices_){
    return IndexObject{std::forward(indices_)};
}

template 
auto operator | (R&& range, IndexObject index_obj){
    return std::forward(index_obj.indices) | indexing(std::forward(range));
}

int main()
{
    std::vector v1 = { 0,30,20,40,10 };
    std::vector i1 = { 0,4,2,1,3 };

    Value value(30);

    //Standard approach: Using projection
    auto range_pred1 = std::ranges::equal_range(i1, value, {}, create_index_projection(v1));
    std::cout << "*range_pred.first = " << v1[(*range_pred1.begin()).index].data << std::endl;
    std::cout << "*range_pred.second= " << v1[(*range_pred1.end()).index].data << std::endl;

    std::cout << std::endl << "range_pred.second-range_pred.first " << std::distance(range_pred1.begin(), range_pred1.end()) << std::endl;

    //Alternatve approach: Using custom view
    //May be the syntax you are looking for
    auto indexing_view = i1 | indexing(v1);

    auto range_pred2 = std::ranges::equal_range(indexing_view, value);
    std::cout << "*range_pred.first = " << (*range_pred2.begin()).data << std::endl;
    std::cout << "*range_pred.second= " << (*range_pred2.end()).data << std::endl;

    std::cout << std::endl << "range_pred.second-range_pred.first " << std::distance(range_pred2.begin(), range_pred2.end()) << std::endl;

    //Variant of above, with more "natural" order in syntax
    auto indexing_view2 = v1 | indexedby(i1);

    auto range_pred3 = std::ranges::equal_range(indexing_view2, value);
    std::cout << "*range_pred.first = " << (*range_pred3.begin()).data << std::endl;
    std::cout << "*range_pred.second= " << (*range_pred3.end()).data << std::endl;

    std::cout << std::endl << "range_pred.second-range_pred.first " << std::distance(range_pred3.begin(), range_pred3.end()) << std::endl;

    //Also works for rvalues provided that std::ranges::dangling is not returned by the algorithm
    bool binary_search_result1 = std::ranges::binary_search(std::vector{0,20,10}|indexedby(std::vector{0,2,1}), Value{10});
    bool binary_search_result2 = std::ranges::binary_search(std::vector{0,20,10}|indexedby(std::vector{0,2,1}), Value{15});
    std::cout << std::boolalpha;
    std::cout << std::endl << "binary search for 10 " << binary_search_result1 << std::endl;
    std::cout << "binary search for 15 " << binary_search_result2 << std::endl;
}

Demo: https://godbolt.org/z/389Y99Mnz

The adaptors indexing and indexedby are aimed at code deduplication. You only need to define once, and then use either index_range | indexing(indexed_range) or indexed_range | indexedby(index_range) to obtain a view which you can almost use as a reordered range. Only one of the two variants need to be defined so you can choose whichever syntax that you find the most natural. This addresses the transparent requirement in your question. So let's say you have the following

std::string s="abc"; std::array indices[]{1,2,0}; auto sView = s | indexedby(indices); //Or, equivalently //auto sView = indices | indexing(s);

Note that now indices and s are of completely different types from above, but with the same definition of the adaptors sView can be used as a random access range containing "bca". It is random access because the underlying range indices is also so. That means for example sView[2]=='a' and for(const auto c: sView){...} does what you expect them to do (the latter does not really require a random access range though), which is good enough for most applications.

It is also composable. So for example if you have another level of indirection using a index vector i2, in addition to i1 and v1, then you can just use the view auto view = v1 | indexedby(i1) | indexedby(i2) to access the elements transparently. Due to the restriction that overloaded [] must be a member function, a syntax like v1[indexedby(i1)][indexedby(i2)] is not possible.

Some more explanation about the definition of the IndexObject template: it is a thin wrapper (the extra cost in most C++ implementations being the memory of a single pointer) around a range of Index (e.g. std::vector in your example), and indexedby is actually just a more descriptive name for the make function of an IndexObject, it can just be called makeIndexObject just like make_pair is to std::pair.

Now, this thin wrapper around the original range of Index makes sure that the correct operator | is picked up for the syntax v1 | indexedby(i1). It could have just been v1 | i1, by just changing the signature of operator|, but I have decided that that is too error-prone and does not convey the intention well, so I have chosen to make this extra wrapper.

So to summarize its purpose in a sentence: it is to wrap (capture) an existing range of Index to make the syntax v1 | indexedby(i1) possible. In the implementation of the corresponding operator | of many range adaptors, this is called a closure.

For the "how it works" part. Most of the information are already presented above, but I can rephrase it in a way corresponding to the natural program flow.

indexedby wraps (captures) a range of Index, which is an IndexObject.

The capture is combined with the range to inspect, which returns a random access std::transform_view.

The std::transform_view can then be used for accessing transparently.

Conceptually, it is not much different from (Capture(i1))(v1) -> std::transform_view.

At this point it may be tempting to get rid of the classes Value and Index entirely, since the original intent is preventing the misuse of an index vector as a value one. Doing so is in fact perfectly safe in terms of type safety. The reason is as follows.

For the indexedby adaptor, since a range cannot be converted to an IndexObject thanks to the explicit constructor, and an IndexObject is not a valid range itself, it is now crystal clear that for the values you have to use a real range and for indices you have to use an IndexObject. There's no way they can get mixed up in the type system. Similarly for the indexing adaptor, for the values you have to use a range adaptor and for the indices you have to use a real range, and of course the standard library doesn't allow implicit conversions between them.

The following snippet from this demo clearly shows the type safety. For the definitions of the types involved in the demo, refer to the link to the demo itself. The demo is adapted from another demo provided by OP.

namespace TypeSafetyDemo{ namespace Approach1{ using Value = ValueObject&>; using Index = std::vector; using F = void(*)(Value, Index); static_assert(std::invocable); static_assert(!std::invocable); static_assert(!std::invocable); static_assert(!std::invocable); } namespace Approach2{ using Value = std::vector; using Index = IndexObject&>; using F = void(*)(Value, Index); static_assert(std::invocable); static_assert(!std::invocable); static_assert(!std::invocable); static_assert(!std::invocable); } }

An additional note is that since both adaptors capture the underlying range by reference, the usual warns about lifetime and reference members apply. Don't store an indexedby(i1) somewhere and use it after the lifetime of i1 ends. For a similar reason, having a function signature of f(std::vector, IndexObject&&> index_obj) is dangerous and not very useful, as the underlying vector will be moved to the owning_view in the first use of index_obj, and subsequent uses of index_obj will be invalid. I've also modified the template IndexObject to make IndexObject> ill-formed to prevent misuses similar to this.

If you really want to make f(std::vector, IndexObject&&>) useful and safe, here provides an alternative. It only moves the underlying vector if both that vector and the IndexObject itself are rvalues.

About the suggestion from a now deleted comment: R range with random_access_range R does not guarantee range[i.index] is well-formed. In fact the concept does not say much about the member functions of R itself, only about its iterators. So in order to be general enough it is still necessary to use std::ranges::begin(range)[i.index]. Although all random access ranges in the standard library do provide operator[].

Transparent indexes for STL algorithms in C++

Answers (1)

Related Questions