Reputation: 23690
I'm struggling to remove xml unicode characters from strings. Adapting this solution for Python 3 fails:
s = 'fooСъбbar'
s.encode('ascii', errors='ignore')
# b'fooСъбbar'
I've also tried unescaping with xml.sax.saxutils but with no luck:
unescape(s).encode('ascii', errors='ignore')
# b'fooСъbar'
Any suggestions appreciated.
Upvotes: 0
Views: 274
Reputation: 36838
You might harness html.unescape
for this task
import html
s = 'fooСъбbar'
s2 = html.unescape(s).encode('ascii', errors='ignore')
print(s2)
output:
b'foobar'
Upvotes: 1