Reputation: 2672
I have the following XML portion:
<table>
<tr>
<td>Hello</td>
<td>Hello</td>
<td>
<p>Hello already in P</p>
</td>
<td>
This one has some naked text
<span>and some span wrapped text</span>
</td>
</tr>
</table>
I would like to wrap (in a p tag) the contents of each cell that is not already wrapped in a p tag. So that the output is:
<table>
<tr>
<td><p>Hello</p></td>
<td><p>Hello</p></td>
<td>
<p>Hello already in p tag</p>
</td>
<td>
<p>
This one has some text
<span>and some span wrapped text</span>
</p>
</td>
</tr>
</table>
I'm using lxml etree in my project but the library doesn't seem to have a "wrap" method or something similar.
Now I'm thinking maybe this is a job for XSLT transformations but I'd like to avoid adding another layer of complexity + other dependencies in my Python project.
The content of td's can be of any depth
Upvotes: 0
Views: 487
Reputation: 2869
I don't use the lxml package myself but try the following:
def wrap(root):
# find <td> elements that do not have a <p> element
cells = etree.XPath("//td[not(p)]")(root)
for cell in cells:
# Create new <p> element
e = Element("p")
# Set the <p> element text from the parent
e.text = cell.text
# Clear the parent text because it is now in the <p> element
cell.text = None
# Move the parents children and make them the <p> element's children
# (because the span on line 10 of the input file should be nested)
for child in cell.getchildren():
# This actually moves the child from the <td> element to the <p> element
e.append(child)
# Set the new <p> element as the cell's child
cell.append(e)
Upvotes: 1