Youcha
Youcha

Reputation: 1564

Remove characters not in a set in Python

I have a string and I'm trying to remove all characters that are not alphanumeric nor in this set

'''!$%*()_-=+\/.,><:;'"?|'''.

I know this removes all non alphanumeric characters but how can I do better?

re.sub(r'\W+','',line)

Upvotes: 1

Views: 3683

Answers (3)

abought
abought

Reputation: 2680

With credit to this thread: Remove specific characters from a string in python

First, there's no need to retype all the punctuation manually. The string module defines string.punctuation as a property for your convenience. (Use help(string) to see other similar definitions available)

>>> import string
>>>string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

The exact application of the solution will take some fiddling to define undesired characters; a big downside is that in this form, it only removes the characters you tell it to remove. If you're sure your file is 100% ASCII characters, then you could define:

>>> delchars = ''.join(c for c in map(chr, range(256)) if c not in (string.punctuation + string.digits + string.letters) )

You can filter characters by throwing them out:

>>> text.translate(None, delchars)

EDIT: Here's some interesting timing information for the various methods: Stripping everything but alphanumeric chars from a string in Python

Upvotes: 4

Noctis Skytower
Noctis Skytower

Reputation: 22031

In Python 3.x, you can use the translate method on string to remove characters you do not want:

>>> def remove(string, characters):
        return string.translate(str.maketrans('', '', characters))

>>> import string
>>> remove(string.printable, string.ascii_letters + string.digits + \
                             '''!$%*()_-=+\/.,><:;'"?|''')
'#&@[]^`{}~ \t\n\r\x0b\x0c'

Upvotes: 1

Sven Marnach
Sven Marnach

Reputation: 602115

A Python 2.x non-regex solution:

punctuation = '''!$%*()_-=+\/.,><:;'"?|'''
allowed = string.digits + string.letters + punctuation
filter(allowed.__contains__, s)

The string to filter is s. (This probably isn't the fastest solution for long strings.)

Upvotes: 7

Related Questions