jackthehipster
jackthehipster

Reputation: 1018

Why is this xmlns attribute messing up my xpath query?

I'm parsing a simple jhove output using LibXML. However, I don't get the values I expect. Here's the code:

use feature "say";
use XML::LibXML;

my $PRSR = XML::LibXML->new();
my $xs=<DATA>; 
say $xs;
my $t1 = $PRSR->load_xml(string => $xs);
say "1:" . $t1->findvalue('//date');
$xs=<DATA>; 
say $xs;
$t1 = $PRSR->load_xml(string => $xs);
say "2:" . $t1->findvalue('//date');


__DATA__
<jhove xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://hul.harvard.edu/ois/xml/ns/jhove" xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/jhove http://hul.harvard.edu/ois/xml/xsd/jhove/1.3/jhove.xsd" name="Jhove" release="1.0 (beta 3)" date="2005-02-04"><date>2006-10-06T09:11:34+02:00</date></jhove>
<jhove><date>2006-10-06T09:11:34+02:00</date></jhove>

As you can see, the line "1:" is returning an empty string, while "2:" is returning the expected date. What is in the jhove-root-element that keeps the xpath query from working properly? I even tried in XML-Spy and there it works, even with the full header.

Edit: When I remove the xmlns-attribute from the root element, the xpath query works. But how is that possible?

Upvotes: 2

Views: 718

Answers (2)

jackthehipster
jackthehipster

Reputation: 1018

I found another solution. Simply using this

say "1:" . $t1->findvalue('//*[local-name()="date"]');

will also find the value and save the hassle of declaring namespaces in an XPathContext. But apart from that, tobyinks answer is the correct one.

Upvotes: 2

tobyink
tobyink

Reputation: 13664

The XML::LibXML::Node documentation specifically mentions this issue and how to deal with it...

NOTE ON NAMESPACES AND XPATH:

A common mistake about XPath is to assume that node tests consisting of an element name with no prefix match elements in the default namespace. This assumption is wrong - by XPath specification, such node tests can only match elements that are in no (i.e. null) namespace.

So, for example, one cannot match the root element of an XHTML document with $node->find('/html') since '/html' would only match if the root element <html> had no namespace, but all XHTML elements belong to the namespace http://www.w3.org/1999/xhtml. (Note that xmlns="..." namespace declarations can also be specified in a DTD, which makes the situation even worse, since the XML document looks as if there was no default namespace).

There are several possible ways to deal with namespaces in XPath:

  • The recommended way is to use the XML::LibXML::XPathContext module to define an explicit context for XPath evaluation, in which a document independent prefix-to-namespace mapping can be defined. For example:

    my $xpc = XML::LibXML::XPathContext->new;
    $xpc->registerNs('x', 'http://www.w3.org/1999/xhtml');
    $xpc->find('/x:html',$node);
    
  • Another possibility is to use prefixes declared in the queried document (if known). If the document declares a prefix for the namespace in question (and the context node is in the scope of the declaration), XML::LibXML allows you to use the prefix in the XPath expression, e.g.:

    $node->find('/x:html');
    

Upvotes: 5

Related Questions