wimalopaan
wimalopaan

Reputation: 5472

Examples using reinterpret_cast that do not trigger UB

Reading https://en.cppreference.com/w/cpp/language/reinterpret_cast I wonder what are use-cases of reinterpret_cast that are not UB and are used in practice?

The above description contains many cases where it is legal to convert a pointer to some other type and then back, which is legal. But that seems of less practical use. Accessing an object through a reinterpret_cast pointer is mostly UB due to violations of strict-aliasing (and/or alignment), except accessing through a char*/byte*-pointer.

One helpful exception is casting an integer-constant to a pointer and accessing the target object, which is useful for manipulation of HW-registers (in µC).

Can anyone tell some real use-cases of relevance of reinterpret_cast that are used in practice?

Upvotes: 26

Views: 2951

Answers (3)

user17732522
user17732522

Reputation: 76829

Some examples that come to mind:

  • Reading/writing the object representation of a trivially-copyable object, for example to write the byte representation of the object to a file and read it back:

    // T must be trivially-copyable object type!
    T obj;
    
    //...
    
    std::ofstream file(/*...*/);
    file.write(reinterpret_cast<char*>(&obj), sizeof(obj));
    
    //...
    
    std::ifstream file(/*...*/);
    file.read(reinterpret_cast<char*>(&obj), sizeof(obj));
    

    Technically it is currently not really specified how accessing the object representation should work aside from directly passing on the pointer to memcpy et. al, but there is a current proposal for the standard to clarify at least how reading (but not writing) individual bytes in the object representation should work, see https://github.com/cplusplus/papers/issues/592.

  • Reinterpreting between signed and unsigned variants of the same integral type, especially char and unsigned char for strings, which may be useful if an API expects an unsigned string.

    auto str = "hello world!";
    auto unsigned_str = reinterpret_cast<const unsigned char*>(str);
    

    While this is allowed by the aliasing rules, technically pointer arithmetic on the resulting unsigned_str pointer is currently not defined by the standard. But I don't really see why it isn't.

  • Accessing objects nested within a byte buffer (especially on the stack):

    alignas(T) std::byte buf[42*sizeof(T)];
    new(buf+sizeof(T)) T;
    
    // later
    
    auto ptr = std::launder(reinterpret_cast<T*>(buf + sizeof(T)));
    

    This works as long as the address buf + sizeof(T) is suitably aligned for T, the buffer has type std::byte or unsigned char, and obviously is of sufficient size. The new expression also returns a pointer to the object, but one might not want to store that for each object. If all objects stored in the buffer are the same type, it would also be fine to use pointer arithmetic on a single such pointer.

  • Obtaining a pointer to a specific memory address. Whether and for which address values this is possible is implementation-defined, as is any possible use of the resulting pointer:

    auto ptr = reinterpret_cast<void*>(0x12345678);
    
  • Casting a void* returned by dlsym (or a similar function) to the actual type of a function located at that address. Whether this is possible and what exactly the semantics are is again implementation-defined:

    // my_func is a C linkage function with type `void()` in `my_lib.so`
    
    // error checking omitted!
    
    auto lib = dlopen("my_lib.so", RTLD_LAZY);
    
    auto my_func = reinterpret_cast<void(*)()>(dlsym(lib, "my_func");
    
    my_func();
    
  • Various round-trip casts may be useful to store pointer values or for type erasure.

    Round-trip of an object pointer through void* requires only static_cast on both sides and reinterpret_cast on object pointers is defined in terms of a two-step static_cast through (cv-qualified)void* anyway.

    Round-trip of an object pointer through std::uintptr_t, std::intptr_t, or another integral type large enough to hold all pointer values may be useful for having a representation of the pointer value that can be serialized (although I am not sure how often that really is useful). It is however implementation-defined whether any of these types exist. Typically they will, but exotic platforms where memory addresses cannot be represented as single integer values or all integer types are too small to cover the address space are permitted by the standard. I would also be vary of pointer analysis of the compiler causing issues depending on how you use this, see e.g. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 as just the first bug report I found. The standard isn't particularly clear on how the integer -> pointer cast is supposed to work especially when considering pointer provenance. See for example https://open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2318r1.pdf and other documents linked therein.

    Round-trip of a function pointer through any arbitrary function pointer type (likely void(*)()) may be useful to erase the type from arbitrary functions, although again I am not sure how often that really is useful. void* type-erased arguments are common in C APIs when a function just passes through data, but type-erased function pointers like that are less common.

    A round-trip cast of a function pointer through void* may be used in a similar way as above, as dlsym essentially does with the additional dynamic library complication. This is conditionally-supported only, although it is effectively required for POSIX systems. (It is not generally supported, because object and function pointer values may have distinct representations, size, alignment etc. on some more exotic platforms.)

Upvotes: 35

supercat
supercat

Reputation: 81247

The most common situations where reinterpret_cast is used without undefined behavior involve dialects which extend the C++ language by specifying how they will behave in more situations than mandated by the Standard (i.e. defining the behavior). Although the C++ Standard would allow implementations to treat programs that "violate" the aliasing rules as erroneous, the Standard doesn't require that such programs be viewed that way. According to the C++ Standard itself:

Although this document states only requirements on C++ implementations, those requirements are often easier to understand if they are phrased as requirements on programs, parts of programs, or execution of programs.... If a program contains a violation of a rule for which no diagnostic is required, this document places no requirement on implementations with respect to that program.

Nearly all practical implementations can be configured to extend the semantics of the language by processing reinterpret_cast in a manner consistent with the representations of the objects involved without regard for whether the Standard would require that they do so. The fact that reinterpret_cast allows a consistent syntax for non-portable constructs that exploit such extensions is more broadly useful than most of the "portable" ways the construct may be used.

Upvotes: 4

VL-80
VL-80

Reputation: 614

Another real-world example of using reinterpret_cast is using the various network related functions that accept struct sockaddr * parameter, namely recvfrom(), bind() or accept().

For example, following is the definition of the recvfrom function:

ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags,
                 struct sockaddr *src_addr, socklen_t *addrlen);

Its fifth argument is defined as struct sockaddr *src_addr and it acts as a general interface for accepting a pointer to a structure of a specific address type (for example, sockaddr_in or sockaddr_in6).

The Beej's Guide to Network Programming says:

In memory, the struct sockaddr_in and struct sockaddr_in6 share the same beginning structure as struct sockaddr, and you can freely cast the pointer of one type to the other without any harm, except the possible end of the universe.

Just kidding on that end-of-the-universe thing…if the universe does end when you cast a struct sockaddr_in* to a struct sockaddr*, I promise you it’s pure coincidence and you shouldn’t even worry about it.

So, with that in mind, remember that whenever a function says it takes a struct sockaddr* you can cast your struct sockaddr_in*, struct sockaddr_in6*, or struct sockadd_storage* to that type with ease and safety.

For example:

int fd; // file descriptor value obtained elsewhere
struct sockaddr_in addr {};
socklen_t addr_len = sizeof(addr);
std::vector<std::uint8_t> buffer(4096);
    
const int bytes_recv = recvfrom(fd, buffer.data(), buffer.size(), 0,
                                reinterpret_cast<sockaddr*>(&addr), &addr_len);

Upvotes: 10

Related Questions