alfC
alfC

Reputation: 16300

Tricks to avoid pointer aliasing in generic code

This is a follow up to Restricted access for allocated arrays belonging to separate object after discovering that returning by value across compilation units doesn't help to hint the compiler about disjoint memory allocations.

I have two functions, make_v1 and make_v2, that return an array-like object, by value. (This could be std::vector for example, the important point is that the copy constructor clearly allocates memory).

#include<vector>

template<class It> void modify_v(It);

void dreaded_function();

template<class It1, class It2>
void foo(It1 a, It2 b) {
    *a = 5;
    *b = 6;
    if(*a == *b) dreaded_function();  // this is never called if a and b do not overlap
}

int main() {
    std::vector<int> v1 = make_v1();
    std::vector<int> v2 = make_v2();

    foo(v1.begin(), v2.begin());
}

As you can see in the compiled code, there is no assumption made optimizing the call to dreaded_function. This is just an example of pointer aliasing, just to diagnose the situation. There are well-known cases of pointer aliasing creating worst performance problems.

https://godbolt.org/z/345PKrsnx

What high-level hints can I give the compiler, in C++ (or perhaps using extensions), that the iterator ranges are at memory regions that cannot overlap?

If this were C, or if I were using pointers and C++ extensions, I could use __restrict and a pointer interface, but I want something more high-level.

The only option that I found was to be very explicit about the copy:

...
std::vector<int> v1 = static_cast<std::vector<int> const&>(make_v1());
std::vector<int> v2 = static_cast<std::vector<int> const&>(make_v2());

This at least convinces clang that it can make the optimization (and at the cost of an extra copy I am pretty sure).

https://godbolt.org/z/nYfKeTKqj

But this is ugly and it is also making a copy, also if make_v becomes an included function later, the solution will have a cost. (Also it still doesn't convince GCC!)

Is this the only way I can assure the compiler that v1 and v2 memory do not overlap? Is there a more streamlined solution? Should [[assume]] work for this eventually?

At this point, the best I am hoping is some keyword/extension that "simulates" that the result of make_v1/make_v2 is copied into v1/v2.

Upvotes: 5

Views: 204

Answers (2)

Quuxplusone
Quuxplusone

Reputation: 27334

The general way to convince the compiler of an arbitrary boolean predicate is to __builtin_assume it, something like this:

#include <version>
#if defined(__clang__)
 #define ASSUME(x) __builtin_assume(x)
#elif defined(_MSC_VER)
 #define ASSUME(x) __assume(x)
#elif __has_builtin(__builtin_unreachable)
 #define ASSUME(x) do { if (!(x)) __builtin_unreachable(); } while (0)
#elif defined(__cpp_lib_unreachable)
 #include <utility>
 #define ASSUME(x) do { if (!(x)) std::unreachable(); } while (0)
#endif

(GCC supports __builtin_unreachable but not __builtin_assume. MSVC supports it but only as __assume, not __builtin_assume.)

Unfortunately, no existing compiler seems to understand the following sequence (Godbolt):

int misunderstood(int *p, int *q) {
  ASSUME(p != q);
  *p = 1;
  *q = 2;
  return *p;  // should return "1", surely
}

OTOH, both GCC 10+ and Clang 11+ understand this sequence just fine:

bool understood(int *p, int *q) {
  ASSUME(p != q);
  *p = 1;
  *q = 2;
  return (p == q);  // returns "false"
}

so you'd think there's no actual reason they couldn't go on to understand that, if two pointers point different places, then modifying one place can't possibly modify the other. And vice versa, it's ironic that neither GCC nor Clang understand that two independently __restrict'ed pointers can't be equal. It's almost as if both compilers have an internal idea of "aliasing" which is somehow orthogonal to "equaling."

Upvotes: 1

alfC
alfC

Reputation: 16300

I found a hacky way to do this. I realized that I can use the __restrict keyword in member (pointer) variables.

So, basically, if I "reimplement" vector iterator with a restrict pointer then I can get away with this, in principle.

In practice this only helps on GCC, and not in clang.

template<class Iterator>
struct restricted_vector_iterator {
    explicit restricted_vector_iterator(Iterator it) : p_{&*it} {}

    decltype(auto) operator*() const {return *p_;}

    decltype(auto) operator++(int) {++p_; return *this;}

 private:
    typename Iterator::pointer __restrict p_;
};


...
    foo(restricted_vector_iterator{v1.begin()}, restricted_vector_iterator{v2.begin()});

https://godbolt.org/z/oE1f1vnPr

This looks like casting the iterator to pointer but in generic code a restrict class could work differently for different iterator types. Also I am not modifying "library" code, only user code. (The user knows that the ranges do not overlap.)

Of course this is not ideal.

First, It doesn't scale well, because every critical iterator will have to be wrapped/reimplemented (__restrict cannot be applied to the iterator class, only to the contained pointer). This would be much simpler if __restrict worked recursively for the pointer member inside a class (specifically an iterator.)

Second, it sounds kind of dangerous that one can keep a restricted pointer alive in the wild, maybe I can make this wrapper iterator non-copyable or at least "hard to copy".

Third, it is not clear to me how pointer arithmetic will work (are pointers calculated from other restricted pointers, also restricted?, are restricted among them?)

Fourth, it is not portable, both because __restrict is an extension, and because at the end it only does the optimization on GCC it seems.

If you have any comments let me know.

Finally, if I had control over the container, I could have a different method like v1.RESTRICT_begin() for example.

Upvotes: 3

Related Questions