VishalQuery
VishalQuery

Reputation: 119

Error while changing the encoding of a String with python. UTF-8 & cp1252

I need to convert a string so as to convert it into human readable format.

s = "that’s awful, Find – Best Quotes, “Music gives a soul to the universe, wings to the mind, flight to the imagination and life to everything.” ― Plato."

I want to convert this string to "that’s awful, Find - Best Quotes, "Music gives a soul to the universe, wings to the mind, flight to the imagination and life to everything." ― Plato."

But I'm facing multiple issue w.r.t. different scenarios.

  1. when I use print(str(s.encode('cp1252',"ignore"),'utf-8')) I get

    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 4

  2. when I use print(str(s.encode('cp1252'),'utf-8',"ignore")) I get

    UnicodeEncodeError: 'charmap' codec can't encode character '\u2015' in position 151

  3. when I use print(str(s.encode('cp1252',"ignore"),'utf-8',"ignore")) Then as can be predicted I get the string without error after omitting all apostrophe, single and double inverted commas as

    "thats awful, Find – Best Quotes, Music gives a soul to the universe, wings to the mind, flight to the imagination and life to everything. Plato."

Upvotes: 1

Views: 1256

Answers (1)

devssh
devssh

Reputation: 1196

I tried everything but I could not fix it by myself. A simpler way to do the same search that you did is to s.encode('utf-8', "ignore").decode("utf-8", ignore). I tried latin1, ascii, cp1252 and utf8, utf16 in combinations and gave up. I tried the encodings one by one from this list of python encodings. Then I looked for code that could detect the same smarter.

Then I came by a blog post which explains all the things that could go wrong in fixing the encoding. The solution they proposed was to run a full search of all encodings to find the correct one.

This package is called ftfy.

Disclaimer: I am not related to ftfy. I just saw it today.

pip install ftfy

s = "that’s awful, Find – Best Quotes, “Music gives a soul to the universe, wings to the mind, flight to the imagination and life to everything.” ― Plato."

import ftfy

print(ftfy.fix_text(s))

that's awful, Find – Best Quotes, "Music gives a soul to the universe, wings to the mind, flight to the imagination and life to everything." ― Plato.

This solves the problem. For more info on how they fixed it, see the source code of ftfy here or docs here. :)

Upvotes: 1

Related Questions