Reputation: 11537
I figure out how to remove special non alphanumeric character in python using the following function:
p_nonalphanum = re.compile('\W+')
def removeNonAlphaNum(string):
m = p_nonalphanum.match(string)
if m:
string = string[m.end():]
return string
I would like to keep some specials characters though such as ½, ¾ which i consider as number. How should i edit my regex?
e.g: from "• ½ cup flour" to "½ cup flour"
Upvotes: 2
Views: 1514
Reputation: 18633
Don't bother with a regex where you manually add each character you want, use the builtin isalnum
function!
>>> s = "• ½ cup flour -> ½ cup flour"
>>> def only_alphanum(s):
... s = unicode(s, "utf-8")
... return ' '.join(c for c in s.split() if c.isalnum())
...
>>> print only_alphanum(s)
½ cup flour ½ cup flour
This will let you catch any fraction, instead of just the list of fractions that you've assembled in your regex(which could get long very quickly)
>>> s = "• ¼ cup oats -*(*&!!"
>>> print only_alphanum(s)
¼ cup oats
Upvotes: 2
Reputation: 92986
You can use a negated character class and add all characters you want to keep
You could do something like this:
p_nonalphanum = re.compile('[^\w½¾]+')
print (p_nonalphanum.sub('', test))
Upvotes: 3
Reputation: 141810
>>> def remove_unwanted(s):
... '''• ½ cup flour -> ½ cup flour'''
... allowed = '[\w½¾]+'
... return ' '.join(re.findall(allowed, s))
...
>>> print remove_unwanted('• ½ cup flour ->')
½ cup flour
Upvotes: 2