JeffUK
JeffUK

Reputation: 4241

Python \0 in a string followed by a number behaves inconsistently

I can enter an octal value of 'up to 3 characters' in a string.

Is there any way to enter an octal value of only 1 character?

For instance.

If I want to print \0 followed by "Hello", I can do:

"\0Hello"

but if I want to print \0 followed by "12345" I can't do

"\012345"

instead I have to do

"\00012345"

This can, in very obscure scenarios, lead to inconsistent behaviour.

def parseAsString(characters):
    output = ['H','I''!','\\','0'] + characters
    print("".join(output).encode().decode('unicode_escape'));

parseAsString(['Y','O','U'])
#Output:
#>HI! YOU

parseAsString(['1','2','3'])
#Output:
#>HI!
#>3

Upvotes: 2

Views: 2118

Answers (1)

JeffUK
JeffUK

Reputation: 4241

The answer to this is, when you're dealing with \0, to either.

  1. Always remember to explicitly use \000 or \x00, this may not be possible if your raw text is coming from another source.

  2. When dealing with raw strings AND concatenating them, always decode each constituent part first, then concatenate them last, not the other way around.

For instance the parser will do this for you if you concatenate strings together:

 "\0" + "Hello"

and

 "\0" + "12345"

Both work consistently as expected., because "\0" is converted to "\x00" before being concatenated with the rest of the string.

Or, in the more obscure scenario:

def safeParseAsString(characters):
        output = "".join(['H','I''!','\\','0']).encode().decode('unicode_escape') 
        output +="".join(characters).encode().decode('unicode_escape')
        print(output)

safeParseAsString(['Y','O','U'])
#Output:
#>HI! YOU

safeParseAsString(['1','2','3'])
#Output:
#>HI! 123

Upvotes: 3

Related Questions