Reputation: 12279
Let's consider:
char const str[] = u8"ñ";
auto const* u8_code_units = reinterpret_cast<unsigned char*>(str);
// using u8_code_units elements
Is that fully portable and C++ standard compliant? Or there's some clause that says that it's undefined behaviour or depends on any unspecified value? I know that unsigned char
and char
shall have the same alignment requirements and reinterpret_cast<T*>(v)
equals in that case to static_cast<T*>(static_cast<void*>(v))
, so, I think it is completly safe and portable to use it but I'm not sure.
Upvotes: 0
Views: 165
Reputation: 473916
Is that fully portable and C++ standard compliant?
Kinda, but not for the reason you think.
See, you have to actually save that file to disk in some format. Which means your compiler has to be able to read that same format. And what text formats a compiler supports is implementation-defined.
However, if your compiler supports the format you saved it in, and that format can save Unicode-encoded characters, then your compiler will do the right thing here.
Even the reinterpret_cast
is fine, because the compiler requires that char
arrays can be accessed through unsigned char
arrays, even if the platform's char
is signed. And the standard explicitly requires that, when reading a UTF-8 formatted char
array through an unsigned char
, you will get the bits you expect from the UTF-8 formatting.
Note however:
I know that unsigned char and char shall have the same alignment requirements and reinterpret_cast(v) equals in that case to static_cast(static_cast(v)),
That would not be enough to protect you. It works because the standard explicitly says that it works in this particular case, not because of alignment requirements and such. char
and unsigned char
have exceptions to the rules on aliasing to allow this; alignment has nothing to do with it.
Upvotes: 2