user2292661
user2292661

Reputation: 155

Best way to replace \x00 in list of strings?

I have a list of values from a parsed PE file that include \x00 null bytes at the end of each section. I want to be able to remove the \x00 bytes from the string without removing all "x"s from the file. I have tried doing .replace() and re.sub(), but not with much success.

Using Python 2.6.6

Example.

import re

List = [['.text\x00\x00\x00'], ['.data\x00\x00\x00'], ['.rsrc\x00\x00\x00']]

count = 0
while count < len(List):
    test = re.sub('\\\\x00', '', str(List[count]))
    print test
    count += 1

Output:

['.text']
['.data']
['.rsrc']

I want to get the following output:

.text
.data
.rsrc

Any ideas on the best way of going about this?

Upvotes: 9

Views: 34326

Answers (6)

martineau
martineau

Reputation: 123501

What you're really wanting to do is replace '\x00' characters in strings in a list.

Towards that goal, people often overlook the fact that in Python 2 the non-Unicode string translate() method will also optionally (or only) delete 8-bit characters as illustrated below. (It doesn't accept this argument in Python 3 because strings are Unicode objects by default.)

Your List data structure seems a little odd, since it's a list of one-element lists consisting of just single strings. Regardless, in the code below I've renamed it sections since Capitalized words should only be used for the names of classes according to PEP 8 -- Style Guide for Python Code.

sections = [['.text\x00\x00\x00'], ['.data\x00\x00\x00'], ['.rsrc\x00\x00\x00']]

for section in sections:
    test = section[0].translate(None, '\x00')
    print test

Output:

.text
.data
.rsrc

Upvotes: 3

Atri Basu
Atri Basu

Reputation: 21

I think a better way to take care of this particular problem is to use the following function:

import string

for item  in List:
  filter(lambda x: x in string.printable, str(item))

This will get rid of not just \x00 but any other such hex values that are appended to your string.

Upvotes: 2

Luka Rahne
Luka Rahne

Reputation: 10467

lst = (i[0].rstrip('\x00') for i in List)
for j in lst: 
   print j,

Upvotes: 5

Chris Doggett
Chris Doggett

Reputation: 20757

Try a unicode pattern, like this:

re.sub(u'\x00', '', s)

It should give the following results:

l = [['.text\x00\x00\x00'], ['.data\x00\x00\x00'], ['.rsrc\x00\x00\x00']]
for x in l:
    for s in l:
        print re.sub(u'\x00', '', s)
        count += 1

.text
.data
.rsrc

Or, using list comprehensions:

[[re.sub(u'\x00', '', s) for s in x] for x in l]

Actually, should work without the 'u' in front of the string. Just remove the first 3 slashes, and use this as your regex pattern:

'\x00'

Upvotes: 4

thkang
thkang

Reputation: 11543

from itertools import chain

List = [['.text\x00\x00\x00'], ['.data\x00\x00\x00'], ['.rsrc\x00\x00\x00']]    
new_list = [x.replace("\x00", "") for x in chain(*List)]
#['.text', '.data', '.rsrc']

Upvotes: 0

jamylak
jamylak

Reputation: 133664

>>> L = [['.text\x00\x00\x00'], ['.data\x00\x00\x00'], ['.rsrc\x00\x00\x00']]
>>> [[x[0]] for x in L]
[['.text\x00\x00\x00'], ['.data\x00\x00\x00'], ['.rsrc\x00\x00\x00']]
>>> [[x[0].replace('\x00', '')] for x in L]
[['.text'], ['.data'], ['.rsrc']]

Or to modify the list in place instead of creating a new one:

for x in L:
    x[0] = x[0].replace('\x00', '')

Upvotes: 15

Related Questions