Reputation: 6005
I wanted to pad a string with null characters ("\x00"). I know lots of ways to do this, so please do not answer with alternatives. What I want to know is: Why does Python's string.format()
function not allow padding with nulls?
Test cases:
>>> "{0:\x01<10}".format("bbb")
'bbb\x01\x01\x01\x01\x01\x01\x01'
This shows that hex-escaped characters work in general.
>>> "{0:\x00<10}".format("bbb")
'bbb '
But "\x00" gets turned into a space ("\x20").
>>> "{0:{1}<10}".format("bbb","\x00")
'bbb '
>>> "{0:{1}<10}".format("bbb",chr(0))
'bbb '
Even trying a couple other ways of doing it.
>>> "bbb" + "\x00" * 7
'bbb\x00\x00\x00\x00\x00\x00\x00'
This works, but doesn't use string.format
>>> spaces = "{0: <10}".format("bbb")
>>> nulls = "{0:\x00<10}".format("bbb")
>>> spaces == nulls
True
Python is clearly substituting spaces (chr(0x20)
) instead of nulls (chr(0x00)
).
Upvotes: 9
Views: 4346
Reputation: 21
The answer to the original question is that it was a bug in python.
It was documented as being permitted, but wasn't. It was fixed in 2014. For python 2, the fix first appeared in either 2.7.7 or 2.7.8 (I'm not sure how to tell which)
Original tracked issue.
Upvotes: 2
Reputation: 5830
Because the string.format
method in Python2.7 is a back port from Python3 string.format
. Python2.7 unicode is the Python 3 string, where the Python2.7 string is the Python3 bytes. A string is the wrong type to express binary data in Python3. You would use bytes which has no format method. So really you should be asking why is the format
method on string at all in 2.7 when it should have really only been on the unicode type since that is what became the string in Python3.
Which I guess that answer is that it is too convenient to have it there.
As a related matter why there is not format
on bytes yet
Upvotes: 0
Reputation: 6005
Digging into the source code for Python 2.7, I found that the issue is in this section from ./Objects/stringlib/formatter.h
, lines 718-722 (in version 2.7.3):
/* Write into that space. First the padding. */
p = fill_padding(STRINGLIB_STR(result), len,
format->fill_char=='\0'?' ':format->fill_char,
lpad, rpad);
The trouble is that a zero/null character ('\0'
) is being used as a default when no padding character is specified. This is to enable this behavior:
>>> "{0:<10}".format("foo")
'foo '
It may be possible to set format->fill_char = ' ';
as the default in parse_internal_render_format_spec()
at ./Objects/stringlib/formatter.h:186
, but there's some bit about backwards compatibility that checks for '\0'
later on. In any case, my curiosity is satisfied. I will accept someone else's answer if it has more history or a better explanation for why than this.
Upvotes: 4