tiguero
tiguero

Reputation: 11537

How to remove nonalphanumeric character in python but keep some special characters

I figure out how to remove special non alphanumeric character in python using the following function:

p_nonalphanum = re.compile('\W+')

def removeNonAlphaNum(string):
        m = p_nonalphanum.match(string)
        if m:
            string = string[m.end():]
        return string

I would like to keep some specials characters though such as ½, ¾ which i consider as number. How should i edit my regex?

e.g: from "• ½ cup flour" to "½ cup flour"

Upvotes: 2

Views: 1514

Answers (3)

Nolen Royalty
Nolen Royalty

Reputation: 18633

Don't bother with a regex where you manually add each character you want, use the builtin isalnum function!

>>> s = "• ½ cup flour -> ½ cup flour"
>>> def only_alphanum(s):
...     s = unicode(s, "utf-8")
...     return ' '.join(c for c in s.split() if c.isalnum())
... 
>>> print only_alphanum(s)
½ cup flour ½ cup flour

This will let you catch any fraction, instead of just the list of fractions that you've assembled in your regex(which could get long very quickly)

>>> s = "• ¼ cup oats -*(*&!!"
>>> print only_alphanum(s)
¼ cup oats

Upvotes: 2

stema
stema

Reputation: 92986

You can use a negated character class and add all characters you want to keep

You could do something like this:

p_nonalphanum = re.compile('[^\w½¾]+')
print (p_nonalphanum.sub('', test))

Upvotes: 3

johnsyweb
johnsyweb

Reputation: 141810

>>> def remove_unwanted(s):
...     '''• ½ cup flour -> ½ cup flour'''
...     allowed = '[\w½¾]+'
...     return ' '.join(re.findall(allowed, s))
... 
>>> print remove_unwanted('• ½ cup flour ->')
½ cup flour

Upvotes: 2

Related Questions