bonsaiviking
bonsaiviking

Reputation: 6005

Why can't Python's string.format pad with "\x00"?

I wanted to pad a string with null characters ("\x00"). I know lots of ways to do this, so please do not answer with alternatives. What I want to know is: Why does Python's string.format() function not allow padding with nulls?

Test cases:

>>> "{0:\x01<10}".format("bbb")
'bbb\x01\x01\x01\x01\x01\x01\x01'

This shows that hex-escaped characters work in general.

>>> "{0:\x00<10}".format("bbb")
'bbb       '

But "\x00" gets turned into a space ("\x20").

>>> "{0:{1}<10}".format("bbb","\x00")
'bbb       '
>>> "{0:{1}<10}".format("bbb",chr(0))
'bbb       '

Even trying a couple other ways of doing it.

>>> "bbb" + "\x00" * 7
'bbb\x00\x00\x00\x00\x00\x00\x00'

This works, but doesn't use string.format

>>> spaces = "{0: <10}".format("bbb")
>>> nulls  = "{0:\x00<10}".format("bbb")
>>> spaces == nulls
True

Python is clearly substituting spaces (chr(0x20)) instead of nulls (chr(0x00)).

Upvotes: 9

Views: 4346

Answers (3)

sbrodie
sbrodie

Reputation: 21

The answer to the original question is that it was a bug in python.

It was documented as being permitted, but wasn't. It was fixed in 2014. For python 2, the fix first appeared in either 2.7.7 or 2.7.8 (I'm not sure how to tell which)

Original tracked issue.

Upvotes: 2

cmd
cmd

Reputation: 5830

Because the string.format method in Python2.7 is a back port from Python3 string.format. Python2.7 unicode is the Python 3 string, where the Python2.7 string is the Python3 bytes. A string is the wrong type to express binary data in Python3. You would use bytes which has no format method. So really you should be asking why is the format method on string at all in 2.7 when it should have really only been on the unicode type since that is what became the string in Python3.

Which I guess that answer is that it is too convenient to have it there.

As a related matter why there is not format on bytes yet

Upvotes: 0

bonsaiviking
bonsaiviking

Reputation: 6005

Digging into the source code for Python 2.7, I found that the issue is in this section from ./Objects/stringlib/formatter.h, lines 718-722 (in version 2.7.3):

/* Write into that space. First the padding. */
p = fill_padding(STRINGLIB_STR(result), len,
                 format->fill_char=='\0'?' ':format->fill_char,
                 lpad, rpad);

The trouble is that a zero/null character ('\0') is being used as a default when no padding character is specified. This is to enable this behavior:

>>> "{0:<10}".format("foo")
'foo       '

It may be possible to set format->fill_char = ' '; as the default in parse_internal_render_format_spec() at ./Objects/stringlib/formatter.h:186, but there's some bit about backwards compatibility that checks for '\0' later on. In any case, my curiosity is satisfied. I will accept someone else's answer if it has more history or a better explanation for why than this.

Upvotes: 4

Related Questions