Alexis Eggermont
Alexis Eggermont

Reputation: 8155

Replacing special character in string not working

I have a long string, which includes the text Your Sunday evening order with Uber Eats\nTo: [email protected]\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,

I am trying to replace '\xe2\x82\xac' with 'EUR' in Python 3.6

If I print the string, I see that it is preceded by b, i.e. it is a byte literal.

 b'<div dir="ltr"><br ...' etc.

I cannot encode it (html = html.encode('UTF-8')), because then I get a bytes-like object is required, not 'str' nor can I decode it ('str' object has no attribute 'decode')

I have tried:

html = html.replace(u"\xe2\x82\xac","EUR")
html = html.replace(u'\xe2\x82\xac',"EUR")
html = html.replace('\xe2\x82\xac',"EUR")
html = html.replace(u"€","EUR")

None of these work.

html.decode("utf-8") gets me an error 'str' object has no attribute 'decode'.

For context, the string is generated by reading the content of an e-mail with the mailbox library:

for message in mbox:
   for part in message.walk():
       html = str(part.get_payload(decode=True))

Upvotes: 0

Views: 780

Answers (3)

Veera Balla Deva
Veera Balla Deva

Reputation: 788

import unicodedata
jil = """"Your Sunday evening order with Uber Eats\nTo: [email protected]\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,"""
data = unicodedata.normalize("NFKD", jil)
print(data)
>>>" Your Sunday evening order with Uber Eats
To: [email protected]


[image: map]

[image: Uber logo]
â¬17.50
Thanks for choosing Uber,

Upvotes: 1

Giacomo Catenazzi
Giacomo Catenazzi

Reputation: 9533

You should use:

html = html.replace(r"\xe2\x82\xac", "EUR")

So that the string \xe2\x82\xac is replaced to EUR. Assuming that \ is literally on your html.

Otherwise, you should

html = html.replace('\u20ac', 'EUR')

But this seems not the case, because with your unicode symbols, it do not work.

Do not assume that Python use UTF-8 in the strings (in fact it do not use UTF-8 internally).

Note: Python uses UTF-16 (or UTF-32) so \xe2\x82\xac would never been written by Python (from a decoded string). So or \ was literal, or some output process mangled it.

Upvotes: 2

Luca Di Sabatino
Luca Di Sabatino

Reputation: 139

it does not work that way.

html="Your Sunday evening order with Uber Eats\nTo: [email protected]\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,"
html = html.replace(u"\xe2\x82\xac","EUR")
html = html.replace(u'\xe2\x82\xac',"EUR")
html = html.replace('\xe2\x82\xac',"EUR")
html = html.replace(u"€","EUR")

html = html.encode("utf-8",'strict');

print("Encoded String: " + str(html))
print("Decoded String: " + html.decode("utf-8",'strict'))

Upvotes: 0

Related Questions