Sojimanatsu
Sojimanatsu

Reputation: 601

How do we convert the Nested Lists in Microsoft Word DOCX file to HTML with XSLT?

<w:p w:rsidR="008845A9" w:rsidRPr="001509B0" w:rsidRDefault="008845A9" w:rsidP="004E414C">
        <w:pPr>
            <w:pStyle w:val="AppBody-Description"/>
        </w:pPr>
    </w:p>
    <w:p w:rsidR="001207E2" w:rsidRPr="001509B0" w:rsidRDefault="001207E2" w:rsidP="004E414C">
        <w:pPr>
            <w:pStyle w:val="AppBody-Description"/>
            <w:numPr>
                <w:ilvl w:val="1"/>
                <w:numId w:val="5"/>
            </w:numPr>
        </w:pPr>
        <w:r w:rsidRPr="001509B0">
            <w:t>First Item</w:t>
        </w:r>
    </w:p>
    <w:p w:rsidR="00AD36E6" w:rsidRPr="001509B0" w:rsidRDefault="00AD36E6" w:rsidP="004E414C">
        <w:pPr>
            <w:pStyle w:val="AppBody-Description"/>
            <w:numPr>
                <w:ilvl w:val="1"/>
                <w:numId w:val="5"/>
            </w:numPr>
        </w:pPr>
        <w:r w:rsidRPr="001509B0">
            <w:t>Second Item</w:t>
        </w:r>
    </w:p>
    <w:p w:rsidR="00AD36E6" w:rsidRPr="001509B0" w:rsidRDefault="00AD36E6" w:rsidP="004E414C">
        <w:pPr>
            <w:pStyle w:val="AppBody-Description"/>
            <w:numPr>
                <w:ilvl w:val="1"/>
                <w:numId w:val="5"/>
            </w:numPr>
        </w:pPr>
        <w:r w:rsidRPr="001509B0">
            <w:t>Third Item</w:t>
        </w:r>
    </w:p>
    <w:p w:rsidR="002B7A91" w:rsidRPr="001509B0" w:rsidRDefault="002B7A91" w:rsidP="004E414C">
        <w:pPr>
            <w:pStyle w:val="AppBody-Description"/>
            <w:numPr>
                <w:ilvl w:val="2"/>
                <w:numId w:val="5"/>
            </w:numPr>
        </w:pPr>
        <w:r w:rsidRPr="001509B0">
            <w:t>Third Item – One</w:t>
        </w:r>
    </w:p>
    <w:p w:rsidR="002B7A91" w:rsidRPr="001509B0" w:rsidRDefault="002B7A91" w:rsidP="004E414C">
        <w:pPr>
            <w:pStyle w:val="AppBody-Description"/>
            <w:numPr>
                <w:ilvl w:val="2"/>
                <w:numId w:val="5"/>
            </w:numPr>
        </w:pPr>
        <w:r w:rsidRPr="001509B0">
            <w:t xml:space="preserve">Third Item </w:t>
        </w:r>
        <w:r w:rsidR="006551A3" w:rsidRPr="001509B0">
            <w:t>–</w:t>
        </w:r>
        <w:r w:rsidRPr="001509B0">
            <w:t xml:space="preserve"> Two</w:t>
        </w:r>
    </w:p>
    <w:p w:rsidR="006551A3" w:rsidRPr="001509B0" w:rsidRDefault="006551A3" w:rsidP="004E414C">
        <w:pPr>
            <w:pStyle w:val="AppBody-Description"/>
            <w:numPr>
                <w:ilvl w:val="3"/>
                <w:numId w:val="6"/>
            </w:numPr>
        </w:pPr>
        <w:r w:rsidRPr="001509B0">
            <w:t xml:space="preserve">Sample Item </w:t>
        </w:r>
        <w:r w:rsidR="00554D9D" w:rsidRPr="001509B0">
            <w:t>A</w:t>
        </w:r>
    </w:p>
    <w:p w:rsidR="006551A3" w:rsidRPr="001509B0" w:rsidRDefault="00554D9D" w:rsidP="004E414C">
        <w:pPr>
            <w:pStyle w:val="AppBody-Description"/>
            <w:numPr>
                <w:ilvl w:val="3"/>
                <w:numId w:val="6"/>
            </w:numPr>
        </w:pPr>
        <w:r w:rsidRPr="001509B0">
            <w:t>Sample Item B</w:t>
        </w:r>
    </w:p>
    <w:p w:rsidR="002B7A91" w:rsidRPr="001509B0" w:rsidRDefault="002B7A91" w:rsidP="004E414C">
        <w:pPr>
            <w:pStyle w:val="AppBody-Description"/>
            <w:numPr>
                <w:ilvl w:val="1"/>
                <w:numId w:val="5"/>
            </w:numPr>
        </w:pPr>
        <w:r w:rsidRPr="001509B0">
            <w:t>Fo</w:t>
        </w:r>
        <w:r w:rsidR="00565721" w:rsidRPr="001509B0">
            <w:t>u</w:t>
        </w:r>
        <w:r w:rsidRPr="001509B0">
            <w:t>rth Item</w:t>
        </w:r>
    </w:p>

Hi,this is the part of the code of XML file of the Microsoft Word DOCX and these are the lists are nested like the shown above.

1.First Item
2.Second Item
3.Third Item
    i.Third Item – One
    ii.Third Item – Two
       a.Sample Item A
       b.Sample Item B
4.Fourth Item

Instead of this figure i get the result like;

•First Item
•Second Item
•Third Item
•Third Item – One
•Third Item – Two
•Sample Item A
•Sample Item B
•Fourth Item

And this was my solution in XSLT to fix this problem by using <ul> and <li> but i guess i need something different to implement here. I dont know what to do, the rest is fine i can handle the table parts etc.But the nested lists are problem now.

<xsl:output method="html" doctype-system="about:legacy-compat"/>
    <xsl:template match="/">
        <html>
            <head>
                <title/>
            </head>
            <body>
                <xsl:apply-templates/>
            </body>
        </html>
        </xsl:template>
        <xsl:template match="w:p">
            <xsl:if test="w:pPr/w:pStyle[@w:val='AppBody-Title']">
                <h1>
                    <xsl:apply-templates select="w:r/w:t"/>
                </h1>
            </xsl:if>
            <xsl:if test="w:pPr/w:pStyle[@w:val='AppBody-Description']">
                    <xsl:choose>
                        <xsl:when test="w:pPr/w:numPr">
                        <ul>
                            <li><xsl:apply-templates select="w:r/w:t"/></li>
                        </ul>
                        </xsl:when>
                        <xsl:otherwise>
                        <p>
                            <xsl:apply-templates select="w:r/w:t"/>
                        </p>
                        </xsl:otherwise>
                    </xsl:choose>
            </xsl:if>
            <xsl:if test="w:pPr/w:pStyle[@w:val='AppBody-Claim']">
                <p>
                    <xsl:apply-templates select="w:r/w:t"/>
                </p>
            </xsl:if>
            <xsl:if test="w:pPr/w:spacing[@w:line='360']">
                <p>
                    <xsl:apply-templates select="w:r/w:t"/>
                </p>
            </xsl:if>
            <xsl:if test="w:pPr/w:pStyle[@w:val='AppBody-Heading']">
                <h2>
                    <xsl:apply-templates select="w:r/w:t"/>
                </h2>
            </xsl:if>
        </xsl:template>
</xsl:stylesheet>

*I found the solution *

<xsl:output method="html" doctype-system="about:legacy-compat"/>
    <xsl:template match="/">
        <html>
            <head>
                <title/>
            </head>
            <body>
                <xsl:apply-templates/>
            </body>
        </html>
        </xsl:template>
        <xsl:template match="w:p">
            <xsl:if test="w:pPr/w:pStyle[@w:val='AppBody-Title']">
                <h1>
                    <xsl:apply-templates select="w:r/w:t"/>
                </h1>
            </xsl:if>
            <xsl:if test="w:pPr/w:pStyle[@w:val='AppBody-Description']">
                    <xsl:choose>
                        <xsl:when test="w:pPr/w:numPr/w:ilvl[@w:val='1']">
                        <ul>
                            <li><xsl:apply-templates select="w:r/w:t"/></li>
                        </ul>
                        </xsl:when>
                        <xsl:when test="w:pPr/w:numPr/w:ilvl[@w:val='2']">
                        <ul>
                            <ul>
                                <li><xsl:apply-templates select="w:r/w:t"/></li>
                            </ul>
                        </ul>
                        </xsl:when>
                        <xsl:when test="w:pPr/w:numPr/w:ilvl[@w:val='3']">
                        <ul>
                            <ul>
                                <ul>
                                    <li><xsl:apply-templates select="w:r/w:t"/></li>
                                </ul>
                            </ul>
                        </ul>
                        </xsl:when>
                        <xsl:otherwise>
                        <p>
                            <xsl:apply-templates select="w:r/w:t"/>
                        </p>
                        </xsl:otherwise>
                    </xsl:choose>
            </xsl:if>
            <xsl:if test="w:pPr/w:pStyle[@w:val='AppBody-Claim']">
                <p>
                    <xsl:apply-templates select="w:r/w:t"/>
                </p>
            </xsl:if>
            <xsl:if test="w:pPr/w:spacing[@w:line='360']">
                <p>
                    <xsl:apply-templates select="w:r/w:t"/>
                </p>
            </xsl:if>
            <xsl:if test="w:pPr/w:pStyle[@w:val='AppBody-Heading']">
                <h2>
                    <xsl:apply-templates select="w:r/w:t"/>
                </h2>
            </xsl:if>
        </xsl:template>
</xsl:stylesheet>

*But this is not an automatic solution.For example,If there is another layer in different document,this is not going to work.

How do we select the numbers "1" to ".." automatically ?

Upvotes: 1

Views: 535

Answers (1)

Michael Kay
Michael Kay

Reputation: 163312

There's an answer to this question buried within

XSLT transformation of boolean expressions

but since there's a lot of noise around it there, I'll extract the relevant part.

Given a sequence of elements with level numbers:

<a level="1"/>
<b level="2"/>
<c level="3"/>
<d level="3"/>
<e level="2"/>

we can turn them into a tree structure

<a><b><c/><d/></b><e/></a>

using recursive grouping as follows. We write a template that does one level of grouping, and then calls itself recursively to do the next level:

<xsl:template name="grouping">
  <xsl:param name="input" as="element()*"/>
  <xsl:if test="exists($input)">
    <xsl:variable name="level" select="$input[1]/@level"/>
    <xsl:for-each-group select="$input" 
                        group-starting-with="*[@level=$level]">
      <xsl:copy>
        <xsl:call-template name="grouping">
           <xsl:with-param name="input" 
                           select="current-group()[position() gt 1]"/>
        </xsl:call-template>
      </xsl:copy>
    </xsl:for-each-group>
  </xsl:if>
</xsl:template>

That's using XSLT 2.0. A solution using XSLT 1.0 is going to be much, much harder.

Of course your input has a lot of M$ noise compared with my little sample. But the structure of the problem is the same.

Upvotes: 2

Related Questions