Reputation: 43
I am trying to use python regular expression to remove some characters looks like non unicode from a string. here is my code:
xxx='Juliana Gon\xe7alves Miguel'
t=re.sub('\w*','',xxx)
t
The result is like:
>>> xxx='Juliana Gon\xe7alves Miguel'
>>> t=re.sub('\w*','',xxx)
>>> t
' \xe7 '
This \xe7 is what I am trying to remove. Can anyone have any ideas?
Upvotes: 2
Views: 1160
Reputation: 31739
If the desired output is
'Juliana Gonalves Miguel'
then the following regex should do the trick.
re.sub('(?![ -~]).', '', xxx)
[ -~]
: short and readable version for all ASCII characters
(?!)
: negative lookahead
Upvotes: 2