Reputation: 3
I am a fresh pythoner, thank for help me.
I just want to make the Escape Sequence to Character Entities, like the <
change to <
, but one HTML page have many different Escape Sequence, I can not write many replace statement,like:
str = str.replace(' ', ' ')
...............many code.........
str = str.replace('<', '<')
str = str.replace('>', '>')
It is so long....I just want to have a fun or def, that can make the problem easily. Thank you very much
Upvotes: 0
Views: 145
Reputation: 369454
>>> from HTMLParser import HTMLParser
>>> # from html.parser import HTMLParser # In Python 3.x
>>>
>>> parser = HTMLParser()
>>> parser.unescape('>_<')
u'>_<'
>>> parser.unescape('012')
u'012'
NOTE: HTMLParser.unescape(' ')
returns NO-BREAK SPACE (U+00A0) instead of SPACE.
>>> parser.unescape(' ')
u'\xa0'
BTW, Don't use str
as a variable name, it shadows a builtin function str
.
Upvotes: 2