hanshenrik
hanshenrik

Reputation: 21513

in UTF-8 is there any multi-byte character containing the byte \x27 / chr(39) / ' / single-quote-character?

.. as the title says, in UTF-8 is there any multi-byte character containing the byte \x27 / chr(39) / ' / single-quote-character ?

you may wonder why anyone would want to know that? well, when trying to bypass the function

function quoteLinuxShellArgument(string $argument): string {
    if(false!==strpos($argument,"\x00")){error it is impossible to quote null bytes in shell arguments}
    return "'" . str_replace ( "'", "'\\''", $argument ) . "'";
}

among my first questions was the one in the title.. is there any?

Upvotes: 1

Views: 913

Answers (2)

Remy Lebeau
Remy Lebeau

Reputation: 597906

In UTF-8, any Unicode codepoint that is outside of the ASCII range (U+0000 - U+007F) is required to be encoded using multiple bytes. All of those bytes will have their high bit set to 1.

So no, byte 0x27 (b00100111) will never appear in a multi-byte sequence. 0x27 can only ever be used to encode codepoint U+0027 APOSTROPHE as a single byte.

image

Upvotes: 4

Mark Ransom
Mark Ransom

Reputation: 308490

All of the multi-byte UTF-8 characters have the upper bit set, so there's no chance of colliding with a regular ASCII character. That includes your single quote.

Upvotes: 3

Related Questions