Reputation: 19359
What would be a short simple way to cleanup an user entered string. Here is code I rely on while cleaning up a mess. It would be great if a shorter smarter version of it would be available.
invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
for c in invalid:
if len(line)>0: line=line.replace(c,'')
PS How would I put this for (with nested if) function onto a single line?
Upvotes: 2
Views: 193
Reputation: 5488
This works
invalid = '#@$%^_ '
line = "#master_Of^Puppets#@$%Yeah"
line = "".join([for l in line if l not in invalid])
#line will be - 'masterOfPuppetsYeah'
Upvotes: 1
Reputation: 122240
Use a simple list comprehension:
>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> x = 'foo * bar'
>>> "".join(i for i in x if i not in invalid)
'foobar'
Use list comprehension with string.punctuation
+\s
:
>>> import string
>>> x = 'foo * bar'
>>> "".join(i for i in x if i not in string.punctuation)
'foo bar'
>>> "".join(i for i in x if i not in string.punctuation+" ")
'foobar'
Use str.translate
:
>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> x = 'foo * bar'
>>> x.translate(None,"".join(invalid))
'foobar'
Use re.sub
:
>>> import re
>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> x = 'foo * bar'
>>> y = "["+"".join(invalid)+"]"
>>> re.sub(y,'',x)
'foobar'
>>> re.sub(y+'+','',x)
'foobar'
Upvotes: 1
Reputation: 3914
Here's a snippet that I use in my own code. You're basically using regex to specify what characters are allowed, matching on those, and then concatenating them together.
import re
def clean(string_to_clean, valid='ACDEFGHIKLMNPQRSTVWY'):
"""Remove unwanted characters from string.
Args:
clean: (str) The string from which to remove
unwanted characters.
valid_chars: (str) The characters that are valid and should be
included in the returned sequence. Default character
set is: 'ACDEFGHIKLMNPQRSTVWY'.
Returns: (str) A sequence without the invalid characters, as a string.
"""
valid_string = r'([{}]+)'.format(valid)
valid_regex = re.compile(valid_string, re.IGNORECASE)
# Create string of matching characters, concatenate to string
# with join().
return (''.join(valid_regex.findall(string_to_clean)))
Upvotes: 1
Reputation: 251116
Fastest way to do this is to use str.translate
:
>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> s = '@#$%^&*fdsfs#$%^&*FGHGJ'
>>> s.translate(None, ''.join(invalid))
'fdsfsFGHGJ'
Timing comparisons:
>>> s = '@#$%^&*fdsfs#$%^&*FGHGJ'*100
>>> %timeit re.sub('[#@$%^&*()-+!]', '', s)
1000 loops, best of 3: 766 µs per loop
>>> %timeit re.sub('[#@$%^&*()-+!]+', '', s)
1000 loops, best of 3: 215 µs per loop
>>> %timeit "".join(c for c in s if c not in invalid)
100 loops, best of 3: 1.29 ms per loop
>>> %timeit re.sub(invalid_re, '', s)
1000 loops, best of 3: 718 µs per loop
>>> %timeit s.translate(None, ''.join(invalid)) #Winner
10000 loops, best of 3: 17 µs per loop
On Python3 you need to do something like this:
>>> trans_tab = {ord(x):None for x in invalid}
>>> s.translate(trans_tab)
'fdsfsFGHGJ'
Upvotes: 5
Reputation: 156268
This is one case in which a regex actually is useful.
>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> import re
>>> invalid_re = '|'.join(map(re.escape, invalid))
>>> re.sub(invalid_re, '', 'foo * bar')
'foobar'
Upvotes: 1
Reputation: 122126
You can do it like this:
from string import punctuation # !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
line = "".join(c for c in line if c not in punctuation)
For example:
'hello, I @m pleased to meet you! How *about (you) try something > new?'
becomes
'hello I m pleased to meet you How about you try something new'
Upvotes: 4
Reputation: 2804
import re
re.sub('[#@$%^&*()-+!]', '', line)
re
is the regular expression module. Using square brackets means "match any one of these things inside the brackets". So the call says, "find anything in line
inside the brackets and replace it with nothing (''
).
Upvotes: 5