Reputation: 407
I am trying to write a function that accepts a string (sentence) and then cleans it and returns all alphabets, numbers and a hypen. however the code seems to error. Kindly know what I am doing wrong here.
Example: Blake D'souza is an !d!0t
Should return: Blake D'souza is an d0t
Python:
def remove_unw2anted(str):
str = ''.join([c for c in str if c in 'ABCDEFGHIJKLNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890\''])
return str
def clean_sentence(s):
lst = [word for word in s.split()]
#print lst
for items in lst:
cleaned = remove_unw2anted(items)
return cleaned
s = 'Blake D\'souza is an !d!0t'
print clean_sentence(s)
Upvotes: 0
Views: 3042
Reputation: 4903
A variation using string.translate
which has the benefit ? of being easy to extend and is part of string
.
import string
allchars = string.maketrans('','')
tokeep = string.letters + string.digits + '-'
toremove = allchars.translate(None, tokeep)
s = "Blake D'souza is an !d!0t"
print s.translate(None, toremove)
Output:
BlakeDsouzaisand0t
The OP said only keep characters, digits and hyphen - perhaps they meant keep whitespace as well?
Upvotes: 1
Reputation: 17606
You only return last cleaned word!
Should be:
def clean_sentence(s):
lst = [word for word in s.split()]
lst_cleaned = []
for items in lst:
lst_cleaned.append(remove_unw2anted(items))
return ' '.join(lst_cleaned)
A shorter method could be this:
def is_ok(c):
return c.isalnum() or c in " '"
def clean_sentence(s):
return filter(is_ok, s)
s = "Blake D'souza is an !d!0t"
print clean_sentence(s)
Upvotes: 5