gkpo
gkpo

Reputation: 2672

How to wrap all contents of a tag?

I have the following XML portion:

<table>
  <tr>
    <td>Hello</td>
    <td>Hello</td>
    <td>
      <p>Hello already in P</p>
    </td>
    <td>
      This one has some naked text
      <span>and some span wrapped text</span>
    </td>
  </tr>
</table>

I would like to wrap (in a p tag) the contents of each cell that is not already wrapped in a p tag. So that the output is:

<table>
  <tr>
    <td><p>Hello</p></td>
    <td><p>Hello</p></td>
    <td>
      <p>Hello already in p tag</p>
    </td>
    <td>
      <p>
        This one has some text
        <span>and some span wrapped text</span>
      </p>
    </td>
  </tr>
</table>

I'm using lxml etree in my project but the library doesn't seem to have a "wrap" method or something similar.

Now I'm thinking maybe this is a job for XSLT transformations but I'd like to avoid adding another layer of complexity + other dependencies in my Python project.

The content of td's can be of any depth

Upvotes: 0

Views: 487

Answers (1)

DelboyJay
DelboyJay

Reputation: 2869

I don't use the lxml package myself but try the following:

def wrap(root):
    # find <td> elements that do not have a <p> element
    cells = etree.XPath("//td[not(p)]")(root)
    for cell in cells:
        # Create new <p> element
        e = Element("p")
        # Set the <p> element text from the parent
        e.text = cell.text
        # Clear the parent text because it is now in the <p> element
        cell.text = None
        # Move the parents children and make them the <p> element's children
        # (because the span on line 10 of the input file should be nested)
        for child in cell.getchildren():
           # This actually moves the child from the <td> element to the <p> element
           e.append(child)
        # Set the new <p> element as the cell's child
        cell.append(e)

Upvotes: 1

Related Questions