Reputation: 119
I need to convert a string so as to convert it into human readable format.
s = "that’s awful, Find – Best Quotes, “Music gives a soul to the universe, wings to the mind, flight to the imagination and life to everything.” ― Plato."
I want to convert this string to "that’s awful, Find - Best Quotes, "Music gives a soul to the universe, wings to the mind, flight to the imagination and life to everything." ― Plato."
But I'm facing multiple issue w.r.t. different scenarios.
when I use print(str(s.encode('cp1252',"ignore"),'utf-8'))
I get
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 4
when I use print(str(s.encode('cp1252'),'utf-8',"ignore"))
I get
UnicodeEncodeError: 'charmap' codec can't encode character '\u2015' in position 151
when I use print(str(s.encode('cp1252',"ignore"),'utf-8',"ignore"))
Then as can be predicted I get the string without error after omitting all apostrophe, single and double inverted commas as
"thats awful, Find – Best Quotes, Music gives a soul to the universe, wings to the mind, flight to the imagination and life to everything. Plato."
Upvotes: 1
Views: 1256
Reputation: 1196
I tried everything but I could not fix it by myself. A simpler way to do the same search that you did is to s.encode('utf-8', "ignore").decode("utf-8", ignore). I tried latin1, ascii, cp1252 and utf8, utf16 in combinations and gave up. I tried the encodings one by one from this list of python encodings. Then I looked for code that could detect the same smarter.
Then I came by a blog post which explains all the things that could go wrong in fixing the encoding. The solution they proposed was to run a full search of all encodings to find the correct one.
This package is called ftfy.
Disclaimer: I am not related to ftfy. I just saw it today.
pip install ftfy
s = "that’s awful, Find – Best Quotes, “Music gives a soul to the universe, wings to the mind, flight to the imagination and life to everything.” ― Plato."
import ftfy
print(ftfy.fix_text(s))
that's awful, Find – Best Quotes, "Music gives a soul to the universe, wings to the mind, flight to the imagination and life to everything." ― Plato.
This solves the problem. For more info on how they fixed it, see the source code of ftfy here or docs here. :)
Upvotes: 1