Prem Minister
Prem Minister

Reputation: 407

Python cleaning words in a sentence

I am trying to write a function that accepts a string (sentence) and then cleans it and returns all alphabets, numbers and a hypen. however the code seems to error. Kindly know what I am doing wrong here.

Example: Blake D'souza is an !d!0t
Should return: Blake D'souza is an d0t

Python:

def remove_unw2anted(str):
    str = ''.join([c for c in str if c in 'ABCDEFGHIJKLNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890\''])
    return str

def clean_sentence(s):
    lst = [word for word in s.split()]
    #print lst
    for items in lst:
        cleaned = remove_unw2anted(items)
    return cleaned

s = 'Blake D\'souza is an !d!0t'
print clean_sentence(s)

Upvotes: 0

Views: 3042

Answers (2)

sotapme
sotapme

Reputation: 4903

A variation using string.translate which has the benefit ? of being easy to extend and is part of string.

import string

allchars = string.maketrans('','')

tokeep = string.letters + string.digits + '-'

toremove = allchars.translate(None, tokeep)

s = "Blake D'souza is an !d!0t"

print s.translate(None, toremove)

Output:

BlakeDsouzaisand0t

The OP said only keep characters, digits and hyphen - perhaps they meant keep whitespace as well?

Upvotes: 1

Don
Don

Reputation: 17606

You only return last cleaned word!

Should be:

def clean_sentence(s):
    lst = [word for word in s.split()]

    lst_cleaned = []
    for items in lst:
        lst_cleaned.append(remove_unw2anted(items))
    return ' '.join(lst_cleaned)

A shorter method could be this:

def is_ok(c):
    return c.isalnum() or c in " '"

def clean_sentence(s):
    return filter(is_ok, s)

s = "Blake D'souza is an !d!0t"
print clean_sentence(s)

Upvotes: 5

Related Questions