web_ou
web_ou

Reputation: 11

How to split HTML to XML sections with XPath?

I would like to convert a HTML product sheet into XML (DocBook) document. The problem is to split my HTML descriptions in XML's <simplesect> sections.

I would like to transform this (HTML):

<div class="description">
    <h3>Title 1</h3>
    <p>Paragraph one</p>
    <p>Paragraph two</p>
    ...
    <figure>
        ...
    </figure>
    ...
    <p>Paragraph three</p>
    <h3>Title 2</h3>
    <p>Paragraph one</p>
    ...
    <figure>
        ...
    </figure>
    <p>Paragraph two</p>
    <p>Paragraph three</p>
    ...
</div>

to this (DocBook XML) :

<section>
   <title>My Main Title</title>
   <simplesect>
       <title>Title 1</title>
       <para>Paragraph one</para>
       <para>Paragraph two</para>
       ...
       <mediaobject>
           ...
       </mediaobject>
       ...
       <para>Pargraph three</para>
    </simplesect>
    <simplesect>
        <title>Title 2</title>
        <para>Paragraph one</para>
        ...
        <mediaobject>
            ...
        </mediaobject>
        <para>Paragraph two</para>
        <para>Paragraph three</para>
    </simplesect>
</section>

I've tried to select all tags between H2 tags using the following-sibling and other methods, without success.

How can I find the right XPath expression?

Upvotes: 0

Views: 38

Answers (0)

Related Questions