danorton
danorton

Reputation: 12015

PHP xpath query on XML with default namespace binding

I have one solution to the subject problem, but it’s a hack and I’m wondering if there’s a better way to do this.

Below is a sample XML file and a PHP CLI script that executes an xpath query given as an argument. For this test case, the command line is:

./xpeg "//MainType[@ID=123]"

What seems most strange is this line, without which my approach doesn’t work:

$result->loadXML($result->saveXML($result));

As far as I know, this simply re-parses the modified XML, and it seems to me that this shouldn’t be necessary.

Is there a better way to perform xpath queries on this XML in PHP?


XML (note the binding of the default namespace):

<?xml version="1.0" encoding="utf-8"?>
<MyRoot
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.example.com/data http://www.example.com/data/MyRoot.xsd"
 xmlns="http://www.example.com/data">
  <MainType ID="192" comment="Bob's site">
    <Price>$0.20</Price>
    <TheUrl><![CDATA[http://www.example.com/path1/]]></TheUrl>
    <Validated>N</Validated>
  </MainType>
  <MainType ID="123" comment="Test site">
    <Price>$99.95</Price>
    <TheUrl><![CDATA[http://www.example.com/path2]]></TheUrl>
    <Validated>N</Validated>
  </MainType>
  <MainType ID="922" comment="Health Insurance">
    <Price>$600.00</Price>
    <TheUrl><![CDATA[http://www.example.com/eg/xyz.php]]></TheUrl>
    <Validated>N</Validated>
  </MainType>
  <MainType ID="389" comment="Used Cars">
    <Price>$5000.00</Price>
    <TheUrl><![CDATA[http://www.example.com/tata.php]]></TheUrl>
    <Validated>N</Validated>
  </MainType>
</MyRoot>

PHP CLI Script:

#!/usr/bin/php-cli
<?php

$xml = file_get_contents("xpeg.xml");

$domdoc = new DOMDocument();
$domdoc->loadXML($xml);

// remove the default namespace binding
$e = $domdoc->documentElement;
$e->removeAttributeNS($e->getAttributeNode("xmlns")->nodeValue,"");

// hack hack, cough cough, hack hack
$domdoc->loadXML($domdoc->saveXML($domdoc));

$xpath = new DOMXpath($domdoc);

$str = trim($argv[1]);
$result = $xpath->query($str);
if ($result !== FALSE) {
  dump_dom_levels($result);
}
else {
  echo "error\n";
}

// The following function isn't really part of the
// question. It simply provides a concise summary of
// the result.
function dump_dom_levels($node, $level = 0) {
  $class = get_class($node);
  if ($class == "DOMNodeList") {
    echo "Level $level ($class): $node->length items\n";
    foreach ($node as $child_node) {
      dump_dom_levels($child_node, $level+1);
    }
  }
  else {
    $nChildren = 0;
    foreach ($node->childNodes as $child_node) {
      if ($child_node->hasChildNodes()) {
        $nChildren++;
      }
    }
    if ($nChildren) {
      echo "Level $level ($class): $nChildren children\n";
    }
    foreach ($node->childNodes as $child_node) {
      if ($child_node->hasChildNodes()) {
        dump_dom_levels($child_node, $level+1);
      }
    }
  }
}
?>

Upvotes: 6

Views: 6131

Answers (4)

Tertium
Tertium

Reputation: 6308

Also as a variant you may use a xpath mask:

//*[local-name(.) = 'MainType'][@ID='123']

Upvotes: 0

danorton
danorton

Reputation: 12015

Given the current state of the XPath language, I feel that the best answer is provided by Tomalek: to associate a prefix with the default namespace and to prefix all tag names. That’s the solution I intend to use in my current application.

When that’s not possible or practical, a better solution than my hack is to invoke a method that does the same thing as re-scanning (hopefully more efficiently): DOMDocument::normalizeDocument(). The method behaves “as if you saved and then loaded the document, putting the document in a ‘normal’ form.”

Upvotes: 0

Tomalak
Tomalak

Reputation: 338228

The solution is using the namespace, not getting rid of it.

$result = new DOMDocument();
$result->loadXML($xml);

$xpath = new DOMXpath($result);
$xpath->registerNamespace("x", trim($argv[2]));

$str = trim($argv[1]);
$result = $xpath->query($str);

And call it as this on the command line (note the x: in the XPath expression)

./xpeg "//x:MainType[@ID=123]" "http://www.example.com/data"

You can make this more shiny by

  • finding out default namespaces yourself (by looking at the namespace property of the document element)
  • supporting more than one namespace on the command line and register them all before $xpath->query()
  • supporting arguments in the form of xyz=http//namespace.uri/ to create custom namespace prefixes

Bottom line is: In XPath you can't query //foo when you really mean //namespace:foo. These are fundamentally different and therefore select different nodes. The fact that XML can have a default namespace defined (and thus can drop explicit namespace usage in the document) does not mean you can drop namespace usage in XPath.

Upvotes: 13

cwallenpoole
cwallenpoole

Reputation: 82028

Just out of curiosity, what happens if you remove this line?

$e->removeAttributeNS($e->getAttributeNode("xmlns")->nodeValue,"");

That strikes me as the most likely to cause the need for your hack. You're basically removing the xmlns="http://www.example.com/data" part and then re-building the DOMDocument. Have you considered simply using string functions to remove that namespace?

$pieces = explode('xmlns="', $xml);
$xml = $pieces[0] . substr($pieces[1], strpos($pieces[1], '"') + 1);

Then continue on your way? It might even end up being faster.

Upvotes: 1

Related Questions