Reputation: 853
How to remove those "\x00\x00" in a string ?
I have many of those strings (example shown below). I can use re.sub
to replace those "\x00". But I am wondering whether there is a better way to do that? Converting between unicode, bytes and string is always confusing.
'Hello\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'.
Upvotes: 60
Views: 156066
Reputation: 2935
If you are dealing with a zero-padded buffer then you can use rstrip
to remove trailing \x00
s
>>> text = 'Hello\x00\x00\x00\x00'
>>> text.rstrip('\x00')
'Hello'
It removes all \x00
characters at the end of the string but keeps any nulls in the middle. Not suitable for null-terminated strings that may contain random data after the terminator.
If you are dealing with a null-terminated string where the first zero indicates the end of string, but there might be other characters following it, you should use anregen's solution.
>>> text = 'Hello\x00\x24\x4e\x32'
>>> text.split('\x00', 1)[0]
'Hello'
It splits the text at the first zero and returns the slice. It works with strings having no null character too.
EDIT:
Explained rstrip in more detail and provided a correct use case.
Included alternative solution.
Upvotes: 80
Reputation: 11
Neil wrote, '...you might want to put some thought into why you have them in the first place.' For my own issue with this error code, this led me to the problem. My saved file that I was reading from was in unicode. Once I re-saved the file as a plain ASCII text, the problem was solved
Upvotes: 1
Reputation: 1
I ran into this problem copy lists out of Excel. Process was:
Problem was intermitently was returning multiple '\x00' at the end of the text when reading the clipboard.
Have changed from using win32clipboard to using pyperclip to read the clipboard, and it seems to have resolved the problem.
Upvotes: -1
Reputation: 11
I tried strip
and rstrip
and they didn't work, but this one did;
Use split
and then join
the result list
:
if '\x00' in name:
name=' '.join(name.split('\x00'))
Upvotes: 1
Reputation: 563
Building on the answers supplied, I suggest that strip() is more generic than rstrip() for cleaning up a data packet, as strip() removes chars from the beginning and the end of the supplied string, whereas rstrip() simply removes chars from the end of the string.
However, NUL chars are not treated as whitespace by default by strip(), and as such you need to specify explicitly. This can catch you out, as print() will of course not show the NUL chars. My solution that I used was to clean the string using ".strip().strip('\x00')
":
>>> arbBytesFromSocket = b'\x00\x00\x00\x00hello\x00\x00\x00\x00'
>>> arbBytesAsString = arbBytesFromSocket.decode('ascii')
>>> print(arbBytesAsString)
hello
>>> str(arbBytesAsString)
'\x00\x00\x00\x00hello\x00\x00\x00\x00'
>>> arbBytesAsString = arbBytesFromSocket.decode('ascii').strip().strip('\x00')
>>> str(arbBytesAsString)
'hello'
>>>
This gives you the string/byte array required, without the NUL chars on each end, and also preserves any NUL chars inside the "data packet", which is useful for received byte data that may contain valid NUL chars (eg. a C-type structure). NB. In this case the packet must be "wrapped", i.e. surrounded by non-NUL chars (prefix and suffix), to allow correct detection, and thus only strip unwanted NUL chars.
Upvotes: 12
Reputation: 1602
I think the more general solution is to use:
cleanstring = nullterminatedstring.split('\x00',1)[0]
Which will split
the string using \x00
as the delimeter 1
time. The split(...)
returns a 2 element list: everything before the null in addition to everything after the null (it removes the delimeter). Appending [0]
only returns the portion of the string before the first null (\x00) character, which I believe is what you're looking for.
The convention in some languages, specifically C-like, is that a single null character marks the end of the string. For example, you should also expect to see strings that look like:
'Hello\x00dpiecesofsomeoldstring\x00\x00\x00'
The answer supplied here will handle that situation as well as the other examples.
Upvotes: 14
Reputation: 6141
>>> a = 'Hello\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>> a.replace('\x00','')
'Hello'
Upvotes: 70