671coder
671coder

Reputation: 3

How to use the python make the Escape Sequence to Character Entities

I am a fresh pythoner, thank for help me. I just want to make the Escape Sequence to Character Entities, like the &lt; change to <, but one HTML page have many different Escape Sequence, I can not write many replace statement,like:

str = str.replace('&nbsp;', ' ')

...............many code.........

str = str.replace('&lt;', '<')
str = str.replace('&gt;', '>')

It is so long....I just want to have a fun or def, that can make the problem easily. Thank you very much

Upvotes: 0

Views: 145

Answers (1)

falsetru
falsetru

Reputation: 369454

Use HTMLParser.HTMLParser:

>>> from HTMLParser import HTMLParser
>>> # from html.parser import HTMLParser # In Python 3.x
>>> 
>>> parser = HTMLParser()
>>> parser.unescape('&gt;_&lt;')
u'>_<'
>>> parser.unescape('&#48;&#49;&#x32;')
u'012'

NOTE: HTMLParser.unescape('&nbsp;') returns NO-BREAK SPACE (U+00A0) instead of SPACE.

>>> parser.unescape('&nbsp;')
u'\xa0'

BTW, Don't use str as a variable name, it shadows a builtin function str.

Upvotes: 2

Related Questions