alphanumeric
alphanumeric

Reputation: 19359

Cleanup A String from Most Simply

What would be a short simple way to cleanup an user entered string. Here is code I rely on while cleaning up a mess. It would be great if a shorter smarter version of it would be available.

invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
for c in invalid: 
    if len(line)>0: line=line.replace(c,'')

PS How would I put this for (with nested if) function onto a single line?

Upvotes: 2

Views: 193

Answers (7)

Kamehameha
Kamehameha

Reputation: 5488

This works

invalid = '#@$%^_ '
line = "#master_Of^Puppets#@$%Yeah"
line = "".join([for l in line if l not in invalid])
#line will be - 'masterOfPuppetsYeah'

Upvotes: 1

alvas
alvas

Reputation: 122240

Use a simple list comprehension:

>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> x = 'foo * bar'
>>> "".join(i for i in x if i not in invalid)
'foobar'

Use list comprehension with string.punctuation+\s:

>>> import string
>>> x = 'foo * bar'
>>> "".join(i for i in x if i not in string.punctuation)
'foo  bar'
>>> "".join(i for i in x if i not in string.punctuation+" ")
'foobar'

Use str.translate:

>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> x = 'foo * bar'
>>> x.translate(None,"".join(invalid))
'foobar'

Use re.sub:

>>> import re
>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> x = 'foo * bar'
>>> y = "["+"".join(invalid)+"]"
>>> re.sub(y,'',x)
'foobar'
>>> re.sub(y+'+','',x)
'foobar'

Upvotes: 1

Michelle Welcks
Michelle Welcks

Reputation: 3914

Here's a snippet that I use in my own code. You're basically using regex to specify what characters are allowed, matching on those, and then concatenating them together.

import re

def clean(string_to_clean, valid='ACDEFGHIKLMNPQRSTVWY'):
    """Remove unwanted characters from string.

    Args:
    clean: (str) The string from which to remove
     unwanted characters.

     valid_chars: (str) The characters that are valid and should be
     included in the returned sequence. Default character
     set is: 'ACDEFGHIKLMNPQRSTVWY'.

     Returns: (str) A sequence without the invalid characters, as a string.

     """
    valid_string = r'([{}]+)'.format(valid)
    valid_regex = re.compile(valid_string, re.IGNORECASE)

    # Create string of matching characters, concatenate to string
    # with join().
    return (''.join(valid_regex.findall(string_to_clean)))

Upvotes: 1

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 251116

Fastest way to do this is to use str.translate:

>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> s = '@#$%^&*fdsfs#$%^&*FGHGJ'
>>> s.translate(None, ''.join(invalid))
'fdsfsFGHGJ'

Timing comparisons:

>>> s = '@#$%^&*fdsfs#$%^&*FGHGJ'*100

>>> %timeit re.sub('[#@$%^&*()-+!]', '', s)
1000 loops, best of 3: 766 µs per loop

>>> %timeit re.sub('[#@$%^&*()-+!]+', '', s)
1000 loops, best of 3: 215 µs per loop

>>> %timeit "".join(c for c in s if c not in invalid)
100 loops, best of 3: 1.29 ms per loop

>>> %timeit re.sub(invalid_re, '', s)
1000 loops, best of 3: 718 µs per loop

>>> %timeit s.translate(None, ''.join(invalid))         #Winner
10000 loops, best of 3: 17 µs per loop

On Python3 you need to do something like this:

>>> trans_tab = {ord(x):None for x in invalid}
>>> s.translate(trans_tab)
'fdsfsFGHGJ'

Upvotes: 5

SingleNegationElimination
SingleNegationElimination

Reputation: 156268

This is one case in which a regex actually is useful.

>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> import re
>>> invalid_re = '|'.join(map(re.escape, invalid))
>>> re.sub(invalid_re, '', 'foo * bar')
'foobar'

Upvotes: 1

jonrsharpe
jonrsharpe

Reputation: 122126

You can do it like this:

from string import punctuation # !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~

line = "".join(c for c in line if c not in punctuation)

For example:

'hello, I @m pleased to meet you! How *about (you) try something > new?'

becomes

'hello I m pleased to meet you How about you try something  new'

Upvotes: 4

Dan
Dan

Reputation: 2804

import re
re.sub('[#@$%^&*()-+!]', '', line)

re is the regular expression module. Using square brackets means "match any one of these things inside the brackets". So the call says, "find anything in line inside the brackets and replace it with nothing ('').

Upvotes: 5

Related Questions