RadiantHex
RadiantHex

Reputation: 25547

From escaped html -> to regular html? - Python

I used BeautifulSoup to handle XML files that I have collected through a REST API.

The responses contain HTML code, but BeautifulSoup escapes all the HTML tags so it can be displayed nicely.

Unfortunately I need the HTML code.


How would I go on about transforming the escaped HTML into proper markup?


Help would be very much appreciated!

Upvotes: 8

Views: 5537

Answers (2)

Nathan Osman
Nathan Osman

Reputation: 73155

You could try the urllib module?

It has a method unquote() that might suit your needs.

Edit: on second thought, (and more reading of your question) you might just want to just use string.replace()

Like so:

string.replace('&lt;','<')
string.replace('&gt;','>')

Upvotes: 2

Alex Martelli
Alex Martelli

Reputation: 881537

I think you want xml.sax.saxutils.unescape from the Python standard library.

E.g.:

>>> from xml.sax import saxutils as su
>>> s = '&lt;foo&gt;bar&lt;/foo&gt;'
>>> su.unescape(s)
'<foo>bar</foo>'

Upvotes: 20

Related Questions