shoosh
shoosh

Reputation: 79013

C/C++: Inherent ambiguity of "\xNNN" format in literal strings

Consider these two strings:

wchar_t* x = L"xy\x588xla";
wchar_t* y = L"xy\x588bla";

Upon reading this you would expect that both string literals are the same except one character - an 'x' instead of a 'b'.
It turns out that this is not the case. The first string compiles to:

y = {'x', 'y', 0x588,  'x', 'l', 'a' }

and the second is actually:

x = {'x', 'y', 0x588b, 'l', 'a' }

They are not even the same length!
Yes, the 'b' is eaten up by the hex representation ('\xNNN') character.

At the very least, this could cause confusion and subtle bugs for in hand-written strings (you could argue that unicode strings don't belong in the code body)

But the more serious problem, and the one I am facing, is in auto-generated code. There just doesn't seem to be any way to express this: {'x', 'y', 0x588, 'b', 'l', 'a' } as a literal string without resorting to writing the entire string in hex representation, which is wasteful and unreadable.

Any idea of a way around this?
What's the sense in the language behaving like this?

Upvotes: 8

Views: 3206

Answers (1)

Cheers and hth. - Alf
Cheers and hth. - Alf

Reputation: 145419

A simple way is to use compile time string literal concatenation, thus:

wchar_t const* y = L"xy\x588" L"bla";

Upvotes: 14

Related Questions