bornfromanegg
bornfromanegg

Reputation: 2918

Strange XPath behavior using XML namespaces on XPather.com?

I have the following XML:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
          xmlns:o="urn:schemas-microsoft-com:office:office"
          xmlns:x="urn:schemas-microsoft-com:office:excel"
          xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
          xmlns:html="http://www.w3.org/TR/REC-html40">
  <Names>
    <NamedRange ss:Name="SomeNamedRange" ss:RefersTo="=Control!R1C1:R51C4"/>
  </Names>
  <Worksheet ss:Name="Control" ss:Protected="1">
    <Table ss:ExpandedColumnCount="4" ss:ExpandedRowCount="51">
      <Row>
        <Cell ss:StyleID="s145">          
          <Comment ss:Author="Some comment here">
            <ss:Data xmlns="http://www.w3.org/TR/REC-html40"></ss:Data>
          </Comment>          
        </Cell>
      </Row>      
    </Table>
  </Worksheet>
</Workbook>

I would like to get the Names element with XPath, so I try:

//Names

but this doesn't work. So far, I have found a number of ways to fix this.

//ss:Names
//*:Names
//*[local-name()='Names']

OR, I can delete the following element:

<ss:Data xmlns="http://www.w3.org/TR/REC-html40"></ss:Data>

So clearly, this is something to do with namespaces but I still don't really understand what's going on. So I have two questions:

  1. Why does deleting the ss:Data element affect being able to read the Names element?
  2. Given that there are 5 namespaces declared at the top, why is the Names element considered to be in the ss namespace (when the ss:Data element exists)?
  3. What is the correct general approach here? I feel like there is some general piece of information I'm missing about either XML or XPath

EDIT:

This issue is not limited to http://xpather.com/. I have had various results with different XPath websites, and have summarised the results here.

Upvotes: 2

Views: 257

Answers (2)

kjhughes
kjhughes

Reputation: 111491

You are right to be puzzled.

Just deleting ss:Data should not cause //Names to suddenly select the Names child of Workbook when Workbook declares a default namespace of urn:schemas-microsoft-com:office:spreadsheet. You appear to have stumbled across a bug in xpather.com. Note that their opening, default XML has the following disclaimer regarding namespaces:

This application is in an early beta version so please be forgiving. XPath 2.0 is supported but namespaces are still being added and they may not fully work yet. Please send your comments to: [email protected]

See also (for general XPath in namespaces guidance):


Another xpather.com issue

Currently, xpather.com does not understand that element names may include period (.) characters.


And yet another xpather.com issue

This fully compliant XPath,

//item/comment()[not(preceding-sibling::*)]

results in the following (improper) error message on xpather.com:

TypeError: Cannot read property 'childPosition' of undefined

Upvotes: 1

bornfromanegg
bornfromanegg

Reputation: 2918

I've decided to add this as an answer rather than an edit to the original question since I still may be missing something, but thanks to the comment/answers from @GSerg and @kjhughes, I did some investigation. If this turns out to be useful, I can edit the question and add it in.

The following is just a handful of websites for online XPath evaluation, and how they behaved in my scenario.

+--------------------------------------------------------+--------------+-------------+------------+------------+
|                                                        |     With <ss:Data>         |    Without <ss:Data>    |
+--------------------------------------------------------+--------------+-------------+------------+------------+
|                                                        | //Names      | //ss:Names  | //Names    | //ss:Names |
+--------------------------------------------------------+--------------+-------------+------------+------------+
| https://www.freeformatter.com/xpath-tester.html        | No Match     | Match       | Match      | Match      |
| https://codebeautify.org/Xpath-Tester                  | No Match     | No Match    | No Match   | No Match   |
| http://xpather.com/                                    | No Match     | Match       | Match      | Match      |
| https://www.webtoolkitonline.com/xml-xpath-tester.html | No Match     | Error       | No Match   | Error      |
| http://www.utilities-online.info/xpath/#.Xe4VtTP7QuU   | No Match     | No Match    | No Match   | No Match   |
| https://extendsclass.com/xpath-tester.html             | No Match     | Match       | No Match   | Match      |
+--------------------------------------------------------+--------------+-------------+------------+------------+

From what I understand of the answers so far, the only one that is behaving completely sensibly seems to be ExtendsClass, although freeformatter and xpather do produce the right results when the namespace is specified.

It should also be pointed out that xpather does clearly announce its beta status, and also has a nice UI.

Upvotes: 1

Related Questions