Eugene
Eugene

Reputation: 963

lxml: Force to convert newlines to entities

Is there a way to output newlines inside text elements as 
 entities? Currently, newlines are inserted into output as-is:

from lxml import etree
from lxml.builder import E
etree.tostring(E.a('one\ntwo'), pretty_print=True)
b'<a>one\ntwo</a>\n'

Desired output:

b'<a>one&#13;two</a>\n'

Upvotes: 0

Views: 420

Answers (1)

supersam654
supersam654

Reputation: 3234

After looking through the lxml docs, it looks like there is no way to force certain characters to be printed as escaped entities. It also looks like the list of characters that gets escaped varies by the output encoding.

With all of that said, I'd use BeautifulSoup's prettify() on top of lxml to get the job done:

from bs4 import BeautifulSoup as Soup
from xml.sax.saxutils import escape

def extra_entities(s):
    return escape(s).replace('\n', '&#13;')

soup = Soup("<a>one\ntwo</a>", 'lxml-xml')
print(soup.prettify(formatter=extra_entities))

Output:

<?xml version="1.0" encoding="utf-8"?>
<a>
 one&#10;two
</a>

Note that newlines should actually map to &#10; (&#13; is for carriage returns or \r) but I won't argue because I can't test FCPXML format locally.

Upvotes: 3

Related Questions