Reputation: 8155
I have a long string, which includes the text Your Sunday evening order with Uber Eats\nTo: [email protected]\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,
I am trying to replace '\xe2\x82\xac' with 'EUR' in Python 3.6
If I print the string, I see that it is preceded by b, i.e. it is a byte literal.
b'<div dir="ltr"><br ...' etc.
I cannot encode it (html = html.encode('UTF-8')
), because then I get a bytes-like object is required, not 'str'
nor can I decode it ('str' object has no attribute 'decode'
)
I have tried:
html = html.replace(u"\xe2\x82\xac","EUR")
html = html.replace(u'\xe2\x82\xac',"EUR")
html = html.replace('\xe2\x82\xac',"EUR")
html = html.replace(u"€","EUR")
None of these work.
html.decode("utf-8")
gets me an error 'str' object has no attribute 'decode'
.
For context, the string is generated by reading the content of an e-mail with the mailbox library:
for message in mbox:
for part in message.walk():
html = str(part.get_payload(decode=True))
Upvotes: 0
Views: 780
Reputation: 788
import unicodedata
jil = """"Your Sunday evening order with Uber Eats\nTo: [email protected]\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,"""
data = unicodedata.normalize("NFKD", jil)
print(data)
>>>" Your Sunday evening order with Uber Eats
To: [email protected]
[image: map]
[image: Uber logo]
â¬17.50
Thanks for choosing Uber,
Upvotes: 1
Reputation: 9533
You should use:
html = html.replace(r"\xe2\x82\xac", "EUR")
So that the string \xe2\x82\xac
is replaced to EUR. Assuming that \
is literally on your html.
Otherwise, you should
html = html.replace('\u20ac', 'EUR')
But this seems not the case, because with your unicode symbols, it do not work.
Do not assume that Python use UTF-8 in the strings (in fact it do not use UTF-8 internally).
Note: Python uses UTF-16 (or UTF-32) so \xe2\x82\xac
would never been written by Python (from a decoded string). So or \
was literal, or some output process mangled it.
Upvotes: 2
Reputation: 139
it does not work that way.
html="Your Sunday evening order with Uber Eats\nTo: [email protected]\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,"
html = html.replace(u"\xe2\x82\xac","EUR")
html = html.replace(u'\xe2\x82\xac',"EUR")
html = html.replace('\xe2\x82\xac',"EUR")
html = html.replace(u"€","EUR")
html = html.encode("utf-8",'strict');
print("Encoded String: " + str(html))
print("Decoded String: " + html.decode("utf-8",'strict'))
Upvotes: 0