Luffy Cyliu
Luffy Cyliu

Reputation: 853

How to remove those "\x00\x00"

How to remove those "\x00\x00" in a string ? I have many of those strings (example shown below). I can use re.sub to replace those "\x00". But I am wondering whether there is a better way to do that? Converting between unicode, bytes and string is always confusing.

'Hello\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'.

Upvotes: 60

Views: 156066

Answers (7)

warownia1
warownia1

Reputation: 2935

If you are dealing with a zero-padded buffer then you can use rstrip to remove trailing \x00s

>>> text = 'Hello\x00\x00\x00\x00'
>>> text.rstrip('\x00')
'Hello'

It removes all \x00 characters at the end of the string but keeps any nulls in the middle. Not suitable for null-terminated strings that may contain random data after the terminator.

If you are dealing with a null-terminated string where the first zero indicates the end of string, but there might be other characters following it, you should use anregen's solution.

>>> text = 'Hello\x00\x24\x4e\x32'
>>> text.split('\x00', 1)[0]
'Hello'

It splits the text at the first zero and returns the slice. It works with strings having no null character too.

EDIT:
Explained rstrip in more detail and provided a correct use case.
Included alternative solution.

Upvotes: 80

Jameel Siddiq
Jameel Siddiq

Reputation: 11

Neil wrote, '...you might want to put some thought into why you have them in the first place.' For my own issue with this error code, this led me to the problem. My saved file that I was reading from was in unicode. Once I re-saved the file as a plain ASCII text, the problem was solved

Upvotes: 1

apc
apc

Reputation: 1

I ran into this problem copy lists out of Excel. Process was:

  • Copy a list of ID numbers sent to me in Excel
  • Run set of pyton code that:
    • Read the clipboard as text
    • txt.Split('\n') to give a list
    • Processed each element in the list (updating the production system as requird)

Problem was intermitently was returning multiple '\x00' at the end of the text when reading the clipboard.

Have changed from using win32clipboard to using pyperclip to read the clipboard, and it seems to have resolved the problem.

Upvotes: -1

Alex
Alex

Reputation: 11

I tried strip and rstrip and they didn't work, but this one did; Use split and then join the result list:

if '\x00' in name:
    name=' '.join(name.split('\x00'))

Upvotes: 1

sarlacii
sarlacii

Reputation: 563

Building on the answers supplied, I suggest that strip() is more generic than rstrip() for cleaning up a data packet, as strip() removes chars from the beginning and the end of the supplied string, whereas rstrip() simply removes chars from the end of the string.

However, NUL chars are not treated as whitespace by default by strip(), and as such you need to specify explicitly. This can catch you out, as print() will of course not show the NUL chars. My solution that I used was to clean the string using ".strip().strip('\x00')":

>>> arbBytesFromSocket = b'\x00\x00\x00\x00hello\x00\x00\x00\x00'
>>> arbBytesAsString = arbBytesFromSocket.decode('ascii')
>>> print(arbBytesAsString)
hello
>>> str(arbBytesAsString)
'\x00\x00\x00\x00hello\x00\x00\x00\x00'
>>> arbBytesAsString = arbBytesFromSocket.decode('ascii').strip().strip('\x00')
>>> str(arbBytesAsString)
'hello'
>>>

This gives you the string/byte array required, without the NUL chars on each end, and also preserves any NUL chars inside the "data packet", which is useful for received byte data that may contain valid NUL chars (eg. a C-type structure). NB. In this case the packet must be "wrapped", i.e. surrounded by non-NUL chars (prefix and suffix), to allow correct detection, and thus only strip unwanted NUL chars.

Upvotes: 12

anregen
anregen

Reputation: 1602

I think the more general solution is to use:

cleanstring = nullterminatedstring.split('\x00',1)[0]

Which will split the string using \x00 as the delimeter 1 time. The split(...) returns a 2 element list: everything before the null in addition to everything after the null (it removes the delimeter). Appending [0] only returns the portion of the string before the first null (\x00) character, which I believe is what you're looking for.

The convention in some languages, specifically C-like, is that a single null character marks the end of the string. For example, you should also expect to see strings that look like:

'Hello\x00dpiecesofsomeoldstring\x00\x00\x00'

The answer supplied here will handle that situation as well as the other examples.

Upvotes: 14

galaxyan
galaxyan

Reputation: 6141

>>> a = 'Hello\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' 
>>> a.replace('\x00','')
'Hello'

Upvotes: 70

Related Questions