Timmy
Timmy

Reputation: 12828

Python lxml.html linebreaks?

Im using lxml.html.cleaner to clean html from an input text. how can i change \n to <br /> in lxml.html?

Upvotes: 0

Views: 514

Answers (1)

Tim McNamara
Tim McNamara

Reputation: 18385

Fairly easy, slightly hacky way: You could do this as part of a two step process, assuming you have used lxml.html.parse or whichever method to build DOM.

  1. iterate through the text and tail attributes of the nodes with string replacements. Look at the iterdescendants method, which walks through everything for you.
  2. lxml.html.clean as per normal

A more complex way would be to monkey patch the lxml.html.clean module. Unlike lots of lxml, this module is written in Python and is fairly accessible. For example, there is currently a _substitute_whitespace function.

Upvotes: 1

Related Questions