Alexander Kuzmin
Alexander Kuzmin

Reputation: 1120

Python lxml, removing parent elements before outputting HTML (using fragment_fromstring)

I'm using lxml to parse some HTML fragments (from a RSS feed), and in order to do this efficiently I use the create_parent='div'. When i later output the HTML I don't want the parent div to be included since with my html layout it ends up being a div in a div, totally unnessecary.

The code as is now:

from lxml.html import fragment_fromstring

html = fragment_fromstring(html_string, create_parent = 'div')

for tag in html.xpath('//*[@class]'):
    tag.attrib.pop('class')
for tag in html.xpath('//*[@id]'):
    tag.attrib.pop('id')

return lxml.html.tostring(html)

TL;DR: how do I remove the wrapping div when it outputs?

Upvotes: 4

Views: 1291

Answers (1)

falsetru
falsetru

Reputation: 369074

Extract child elements.

return '\n'.join(lxml.html.tostring(x) for x in html.iterchildren())

Upvotes: 2

Related Questions