Reputation: 1018
I'm parsing a simple jhove output using LibXML. However, I don't get the values I expect. Here's the code:
use feature "say";
use XML::LibXML;
my $PRSR = XML::LibXML->new();
my $xs=<DATA>;
say $xs;
my $t1 = $PRSR->load_xml(string => $xs);
say "1:" . $t1->findvalue('//date');
$xs=<DATA>;
say $xs;
$t1 = $PRSR->load_xml(string => $xs);
say "2:" . $t1->findvalue('//date');
__DATA__
<jhove xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://hul.harvard.edu/ois/xml/ns/jhove" xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/jhove http://hul.harvard.edu/ois/xml/xsd/jhove/1.3/jhove.xsd" name="Jhove" release="1.0 (beta 3)" date="2005-02-04"><date>2006-10-06T09:11:34+02:00</date></jhove>
<jhove><date>2006-10-06T09:11:34+02:00</date></jhove>
As you can see, the line "1:" is returning an empty string, while "2:" is returning the expected date. What is in the jhove-root-element that keeps the xpath query from working properly? I even tried in XML-Spy and there it works, even with the full header.
Edit: When I remove the xmlns-attribute from the root element, the xpath query works. But how is that possible?
Upvotes: 2
Views: 718
Reputation: 1018
I found another solution. Simply using this
say "1:" . $t1->findvalue('//*[local-name()="date"]');
will also find the value and save the hassle of declaring namespaces in an XPathContext. But apart from that, tobyinks answer is the correct one.
Upvotes: 2
Reputation: 13664
The XML::LibXML::Node documentation specifically mentions this issue and how to deal with it...
NOTE ON NAMESPACES AND XPATH:
A common mistake about XPath is to assume that node tests consisting of an element name with no prefix match elements in the default namespace. This assumption is wrong - by XPath specification, such node tests can only match elements that are in no (i.e. null) namespace.
So, for example, one cannot match the root element of an XHTML document with
$node->find('/html')
since'/html'
would only match if the root element<html>
had no namespace, but all XHTML elements belong to the namespace http://www.w3.org/1999/xhtml. (Note thatxmlns="..."
namespace declarations can also be specified in a DTD, which makes the situation even worse, since the XML document looks as if there was no default namespace).There are several possible ways to deal with namespaces in XPath:
The recommended way is to use the XML::LibXML::XPathContext module to define an explicit context for XPath evaluation, in which a document independent prefix-to-namespace mapping can be defined. For example:
my $xpc = XML::LibXML::XPathContext->new; $xpc->registerNs('x', 'http://www.w3.org/1999/xhtml'); $xpc->find('/x:html',$node);
Another possibility is to use prefixes declared in the queried document (if known). If the document declares a prefix for the namespace in question (and the context node is in the scope of the declaration), XML::LibXML allows you to use the prefix in the XPath expression, e.g.:
$node->find('/x:html');
Upvotes: 5