George Wang
George Wang

Reputation: 43

Python Regular expression to remove non unicode characters

I am trying to use python regular expression to remove some characters looks like non unicode from a string. here is my code:

xxx='Juliana Gon\xe7alves Miguel'
t=re.sub('\w*','',xxx)
t

The result is like:

>>> xxx='Juliana Gon\xe7alves Miguel'
>>> t=re.sub('\w*','',xxx)
>>> t
' \xe7 '

This \xe7 is what I am trying to remove. Can anyone have any ideas?

Upvotes: 2

Views: 1160

Answers (1)

Maximilian Peters
Maximilian Peters

Reputation: 31739

If the desired output is

'Juliana Gonalves Miguel'

then the following regex should do the trick.

re.sub('(?![ -~]).', '', xxx)

[ -~]: short and readable version for all ASCII characters

(?!): negative lookahead

Upvotes: 2

Related Questions