slucarini
slucarini

Reputation: 41

regular expression in XPATH

How can I use the match function of the XPATH to search for whole words in an XML tag?

The follow code return "unknown method matches " :

XML_Doc:=CreateOleObject('Msxml2.DOMDocument.6.0') as IXMLDOMDocument3;
XML_DOC.selectNodes('/DATI/DATO[matches(TEST_TAG,"\bTest\b")]');

Example XML FILE

<DATI>
 <DATO>
   <TEST_TAG>Test</TEST_TAG>
 </DATO>
 <DATO>
   <TEST_TAG>Test21</TEST_TAG>
 </DATO>
 <DATO>
   <TEST_TAG>Abc</TEST_TAG>
 </DATO>
</DATI>

Upvotes: 3

Views: 641

Answers (2)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243449

Suppose that by "word" you mean:

Starting with a Latin alphabet letter and all characters contained are either latin letters or decimal digits,

one can use an XPath expression to find exactly these:

  //TEST_TAG
    [contains('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',
              substring(.,1,1)
              )
   and
     not(
     translate(.,
               'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
               '')
         )
    ]

XSLT-based verification:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/*">
     <xsl:copy-of select=
     "//TEST_TAG
        [contains('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',
                  substring(.,1,1)
                  )
       and
         not(
         translate(.,
                   'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
                   '')
             )
        ]
   "/>
 </xsl:template>
</xsl:stylesheet>

when applied on this XML document (the provided one, but with an illegal "word" added):

<DATI>
    <DATO>
        <TEST_TAG>Test</TEST_TAG>
    </DATO>
    <DATO>
        <TEST_TAG>#$%Test21</TEST_TAG>
    </DATO>
    <DATO>
        <TEST_TAG>Abc</TEST_TAG>
    </DATO>
</DATI>

evaluates the above XPath expression and copies the selected elements to the output:

<TEST_TAG>Test</TEST_TAG>
<TEST_TAG>Abc</TEST_TAG>

Do note:

The currently-accepted answer incorrectly produces this:

<TEST_TAG>#$%Test21</TEST_TAG>

as an element whose string value is a "word".

Upvotes: 0

BeniBela
BeniBela

Reputation: 16917

matches is XPath 2 and Msxml only supports XPath 1.
As far as I know there is no library supporting XPath 2 for Delphi. (although I wrote a XPath 2 library for Freepascal, it should be not so difficult to port)

You could use

/DATI/DATO[not(contains(TEST_TAG," "))]

to find words that do not contain a space, which is XPath 1.

Upvotes: 4

Related Questions