Reputation: 4242
I have the following function to keep only ASCII characters:
import re
def remove_special_chars(txt):
txt = re.sub(r'[^\x00-\x7F]+', '', txt)
return txt
But now I also want to keep the copyright symbol (©). What should I add to the pattern?
Upvotes: 3
Views: 772
Reputation: 5207
From Python3.8 onwards Unicode character name may be used in regular expressions via \N{name}
escapes.
So copyright symbol may be used like r'\N{COPYRIGHT SIGN}'
.
In your case,
import re
s = "Python® Copyright ©2001-2019"
re.sub(r'[^\x00-\x7F\N{COPYRIGHT SIGN}]+', '', s)
This gives
'Python Copyright ©2001-2019'
As you can see, the '®' was removed.
Upvotes: 0