Reputation: 105
I am reading some utf-8 encoded data from a file like so:
with open (filename, 'rb') as f:
bytes= f.read(offset, length)
#bytes is b'hello\x00\x00\x00\x00'
text = bytes.decode('utf-8')
#text is 'hello '
stripped_text = text.strip()
#stripped_text is 'hello '
You can recreate this with a simple line like
thing = b'hello\x00\x00\x00\x00'.decode('utf8').strip()
print(thing)
#the output is 'hello '
As you can see, the trailing nul characters are not stripped - I assume this has something to do with '\x00' not being recognized by .strip() but everywhere I look seems to think it should be. What gives? How can I remove these characters without having to do something very clunky?
I couldn't find a post which addressed this issue.
Upvotes: 1
Views: 154
Reputation: 57480
NULs are not whitespace, so strip()
with no arguments will not strip them. You should instead use strip('\0')
:
>>> 'hello\0\0\0\0'.strip('\0')
'hello'
Upvotes: 5