Evan Teran
Evan Teran

Reputation: 90432

Type punning and unions

So, there are a few questions on SO about this subject, but I haven't quite found something that exactly answers the question I have in mind. First some background:

I would like to have a uint32_t field, which I can also access as an array of bytes.

So the first thing that comes to mind is:

union U {
    uint32_t u32;
    uint8_t bytes[sizeof(uint32_t)];
};

Which allows me to do this:

// "works", but is UB as far as I understand
U u;
u.u32 = 0x11223344;
u.bytes[0] = 0x55;

OK, so undefined behavior (UB) is bad, therefore we don't want to do that. Similarly casts are UB and can sometimes be even worse due to alignment concerns (though not in this case because I'm using a char sized object for my array).

// "works", but is UB as far as I understand
uint32_t v = 0x11223344;
auto p = reinterpret_cast<uint8_t *>(&v);
p[0] = 0x55;

Once again, UB is bad, therefore we don't want to do that.

Some say that this is OK if we use a char* instead of a uint8_t*:

// "works", but maybe is UB?
uint32_t v = 0x11223344;
auto p = reinterpret_cast<char *>(&v);
p[0] = 0x55;

But I am honestly not sure about it... So getting creative.


So, I think I remember it being legal (as far as I know) to read the contents of a void* cast to a char* (this allows things like std::memcpy to not be UB). So maybe we can kinda play with this:

uint8_t get_byte(const void *p, size_t n) {
    auto ptr = static_cast<const char *>(p);
    return ptr[n];
}

void set_byte(void *p, size_t index, uint8_t v) {
    auto ptr = static_cast<char *>(p);
    ptr[index] = v;
}

// "works", is this UB?
uint32_t v = 0x11223344;
uint8_t v1 = get_byte(&v, 0); // read
set_byte(&v, 0, 0x55);        // write

So my questions are:

  1. Is the final example I came up with UB?

  2. If it is, what is the "right" way to do this? I really hope the "correct" way isn't a memcpy to and from a byte array. That would be ridiculous.

  3. (BONUS): suppose I want my get_byte to return a reference (like for implementing operator[]. Is it safe to use uint8_t instead of literal char when reading a the contents of a void *?

NOTE: I understand the concerns regarding endian and portability. They are not a problem for my use case. I think that it is acceptable for the result to be an "unspecified value" (in that it is compiler specific which byte it will read). My question is really focused on the UB aspects ("nasal demons" and similar).

Upvotes: 1

Views: 396

Answers (2)

Jarod42
Jarod42

Reputation: 217275

Why not create a class for that ?

Something like:

class MyInt32 {
public:
    std::uint32_t asInt32() const {
        return b[0]
             | (b[1] << 8)
             | (b[2] << 16)
             | (b[3] << 24);
    }
    void setInt32(std::uint32 i) {
        b[0] = (i & 0xFF);
        b[1] = ((i >> 8) & 0xFF);
        b[2] = ((i >> 16) & 0xFF);
        b[3] = ((i >> 24) & 0xFF);
    }
    const std::array<std::uint8_t, 4u>& asInt8() const { return b; }
    std::array<std::uint8_t, 4u>& asInt8() { return b; }
    void setInt8s(const std::array<std::uint8_t, 4u>& a) { b = a; }
private:
    std::array<std::uint8_t, 4u> b;
};

So you don't have UB, you don't break aliasing rules, you manage endianess as you want.

Upvotes: 3

Puppy
Puppy

Reputation: 146930

It's perfectly legit (as long as the type is a POD), and uint8_t is not guaranteed to be legal so don't.

Upvotes: 0

Related Questions