Dayana
Dayana

Reputation: 1538

Obtain a list from a string removing all non-alphanumeric characters

I have this string extracted from a file:

my_string = '\x01\x00\x0e\x00\xff\xff\xffPepe A\x00\xc4\x93\x00\x00100000\x00\xff\xff\xffNu\xf1ez Jim\xe9nez\x00\xf41\x00'

I need to clean that string by removing all non-alphanumeric characters or blanks, so it looks like this:

['Pepe A','100000','Nuñez Jiménez','1']

So far I have tried with the following code:

split_string = re.split(r'[\x00-\x0f]', my_string)
result_list = filter(None, split_string)

But I do not get the result I need. Could someone give me some idea? I'm using Python.

Upvotes: 1

Views: 300

Answers (1)

Stephen Rauch
Stephen Rauch

Reputation: 49794

Something like this will get you close:

Code:

re.split(r'ÿÿÿ|AÄ|ô', ''.join(ch for ch in my_string if ch.isalnum() or ch == ' ')))

Test Code:

import re

my_string = '\x01\x00\x0e\x00\xff\xff\xffPepe A\x00\xc4\x93\x00\x00100000' \
            '\x00\xff\xff\xffNu\xf1ez Jim\xe9nez\x00\xf41\x00'

print(re.split(r'ÿÿÿ|AÄ|ô', ''.join(ch for ch in my_string
                                    if ch.isalnum() or ch == ' ')))

Results:

['', 'Pepe ', '100000', 'Nuñez Jiménez', '1']

Upvotes: 3

Related Questions