Nate
Nate

Reputation: 105

.strip() method not stripping mystery whitespace characters

I am reading some utf-8 encoded data from a file like so:

with open (filename, 'rb') as f:
    bytes= f.read(offset, length)
    #bytes is b'hello\x00\x00\x00\x00'
    text = bytes.decode('utf-8')
    #text is 'hello    '
    stripped_text = text.strip()
    #stripped_text is 'hello    '

You can recreate this with a simple line like

thing = b'hello\x00\x00\x00\x00'.decode('utf8').strip()
print(thing)
#the output is 'hello    '

As you can see, the trailing nul characters are not stripped - I assume this has something to do with '\x00' not being recognized by .strip() but everywhere I look seems to think it should be. What gives? How can I remove these characters without having to do something very clunky?

I couldn't find a post which addressed this issue.

Upvotes: 1

Views: 154

Answers (1)

jwodder
jwodder

Reputation: 57480

NULs are not whitespace, so strip() with no arguments will not strip them. You should instead use strip('\0'):

>>> 'hello\0\0\0\0'.strip('\0')
'hello'

Upvotes: 5

Related Questions