skyman
skyman

Reputation: 2335

Using XPATH to select from a specific range of repeated nodes

I am attempting to parse a document that has the following (much simplified) structure. Each document could have one or more ORB segments each followed by one or more OBX segments. The OBX segments relate directly to the preceeding OBR segment.

<obr> ...... </obr>
<obx> ...... </obx>
<obx> ...... </obx>
<obx> ...... </obx>
<obr> ...... </obr>
<obx> ...... </obx>
<obx> ...... </obx>

The following is a more detailed though still simplified example:

<OBR>
    <OBR.1>
        <OBR.1.1>1</OBR.1.1>
    </OBR.1>
    <OBR.2/>
    <OBR.3>
        <OBR.3.1>12345678</OBR.3.1>
        <OBR.3.2>PLS</OBR.3.2>
    </OBR.3>
    <OBR.4>
        <OBR.4.1>CRP, LFT, Ue</OBR.4.1>
        <OBR.4.2>C Reactive protein, Liver Function Tests, Urea, Elec, Creat</OBR.4.2>
        <OBR.4.3>PLS</OBR.4.3>
    </OBR.4>
</OBR>
<OBX>
    <OBX.1>
        <OBX.1.1>1</OBX.1.1>
    </OBX.1>
    <OBX.2>
        <OBX.2.1>NM</OBX.2.1>
    </OBX.2>
    <OBX.3>
        <OBX.3.1>CRP</OBX.3.1>
        <OBX.3.2>C-Reactive Protein</OBX.3.2>
        <OBX.3.3>PLS</OBX.3.3>
    </OBX.3>
</OBX>

I need to develop and xPath expression / Java code that can extract text from a specific OBR segment together with the muliple text. It is straight forward to extract the index'th OBX.3.2 in the entire document using:

public Object read(String expression, QName returnType, int index) {
    expression = "(" + expression + ")[" + Integer.toString(index) + "]";
    try {
        XPathExpression xPathExpression = xPath.compile(expression);
        return xPathExpression.evaluate(xmlDocument, returnType);
    } catch (XPathExpressionException ex) {
        ex.printStackTrace();
        return null;
    }
}

But I am not sure how to find the index'th OBX.3.2 associated with say the second OBR or indeed how to count the number of OBX segments for each OBR (If I new this I could probably solve the problem). Any guidance or advice would be much appreciated.

I have also tried the Kayessian method ($ns1[count(.| $ns2)=count($ns2)]) to count elements giving:

count( //OBR[3]/following-sibling::OBX [ count (.|//OBR[4]/preceding-sibling::OBX) = count(//OBR[4]/preceding-sibling::OBX )])

This expression gives the number of OBX elements below the indexed OBR and the next OBR. It does not however correctly handle the last OBR in the group (as there is no ORB after it)

Upvotes: 0

Views: 1307

Answers (2)

skyman
skyman

Reputation: 2335

In case others stumble across this: The solution to counting the segments is fairly simple:

To count OBX after the i'th OBR:

count(//OBR[i]/following-sibling::OBX) - count(//OBR[i+1]/following-sibling::OBX) 

It is then possible to loop through the appropriate segments using the reader code above.

Upvotes: 0

newtover
newtover

Reputation: 32094

I would transform the original XML to a more convinient form with XSLT and deal with the result.

An example XSLT transformation:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />

<xsl:template match="OBR">
    <xsl:variable name="$cur_OBR" select="."/>
    <xsl:variable
        name="obx"
        select="following-sibling::OBX[preceding-sibling::OBR[1] = $cur_OBR]"/>

    <OBR position="{position()}">
        <xsl:for-each select="*/*">
            <xsl:variable name="suffix"
                          select="substring-after(name(), 'OBR')"/>
            <xsl:variable name="rel_obx"
                          select="$obx/*/*[ends-with(name(), $suffix)]"/>
            <xsl:apply-templates select="." mode="sub_OBR">
                <xsl:with-params name="suffix" select="$suffix"/>
                <xsl:with-params name="rel_obx" select="$rel_obx"/>
            </xsl:apply-templates>
        </xsl:for-each>
    </OBR>
</xsl:template>

<xsl:template match="OBX"/>

<xsl:template mode="sub_OBR" match="*">
    <xsl:param name="suffix" select="substring-after(name(), 'OBR')"/>
    <xsl:param name="rel_obx"/>

    <xsl:element name="concat('OBR', $suffix)">
        <OBR>
            <xsl:apply-templates select="text()"/>
        </OBR>
        <xsl:for-each select="$rel_obx">
            <OBX>
                <xsl:apply-templates select="text()"/>
            </OBX>
        </xsl:for-each>
    </xsl:element>
</xsl:template>
</xsl:stylesheet>

I have not run it though to check if it is absolutely correct. You can see also that you would need a variable to select corresponding OBX elements for an OBR with xpath: following-sibling::OBX[preceding-sibling::OBR[1] = $cur_OBR].

Upvotes: 1

Related Questions