Reputation: 79013
Consider these two strings:
wchar_t* x = L"xy\x588xla";
wchar_t* y = L"xy\x588bla";
Upon reading this you would expect that both string literals are the same except one character - an 'x'
instead of a 'b'
.
It turns out that this is not the case. The first string compiles to:
y = {'x', 'y', 0x588, 'x', 'l', 'a' }
and the second is actually:
x = {'x', 'y', 0x588b, 'l', 'a' }
They are not even the same length!
Yes, the 'b'
is eaten up by the hex representation ('\xNNN'
) character.
At the very least, this could cause confusion and subtle bugs for in hand-written strings (you could argue that unicode strings don't belong in the code body)
But the more serious problem, and the one I am facing, is in auto-generated code. There just doesn't seem to be any way to express this: {'x', 'y', 0x588, 'b', 'l', 'a' }
as a literal string without resorting to writing the entire string in hex representation, which is wasteful and unreadable.
Any idea of a way around this?
What's the sense in the language behaving like this?
Upvotes: 8
Views: 3206
Reputation: 145419
A simple way is to use compile time string literal concatenation, thus:
wchar_t const* y = L"xy\x588" L"bla";
Upvotes: 14