Reputation: 4664
I have the following string...
"Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process."
I need to turn it into this string...
Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process.
This is pretty standard HTML encoding and I can't for the life of me figure out how to convert it in python.
I found this: GitHub
And it's very close to working, however it does not output an apostrophe but instead some off unicode character.
Here is an example of the output from the GitHub script...
Scam, hoax, or the real deal, heâs gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process.
Upvotes: 1
Views: 1904
Reputation: 8724
What's you're trying to do is called "HTML entity decoding" and it's covered in a number of past Stack Overflow questions, for example:
Here's a code snippet using the Beautiful Soup HTML parsing library to decode your example:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup
string = "Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process."
s = BeautifulSoup(string,convertEntities=BeautifulSoup.HTML_ENTITIES).contents[0]
print s
Here's the output:
Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process.
Upvotes: 4