David M. Karr
David M. Karr

Reputation: 15205

How to make Perl XML::XPath allow queries without namespace prefixes?

I'm trying to use XML::XPath to extract content from XML documents. The documents are specified with namespaces, but I want to use XPath expressions without namespaces. As far as I can tell, I had this working perfectly fine in two different scripts.

It seems like sometime today, the behavior of XML::XPath has changed with respect to this. I don't see what I could have changed that has made this behavior change.

I can get some manual tests to work, if I almost fully specify namespaces, using a call to "set_namespace()" in the script (hardcoding the prefix I expect to use) and specifying the prefix in the XPath expression.

Again, I'm pretty sure I had this working yesterday, without calling "set_namespace()" in the script, or specifying prefixes in the XPath expressions.

If I don't add that "set_namespace()" call and specify prefixes in the expression, I just get empty nodesets from my queries.

I tried setting "$XML::XPath::Namespaces" to zero before I create the first XPath object, but that doesn't seem to make any difference.

The following is a simple script I pipe XML into:

#! /bin/perl
use XML::XPath;
use XML::XPath::XMLParser;
use Getopt::Long;

$| = 1;

my $opt_file;
GetOptions("f|file=s" => \$opt_file);

$XML::XPath::Namespaces = 0;

my $xpath;
if ($opt_file ne '') {
    $xpath = XML::XPath->new(filename => $opt_file);
}
else {
    $xpath = XML::XPath->new(ioref => \*STDIN);
}

while (my $expr = shift @ARGV) {
    my $nodeset = $xpath->find($expr);
    if ($nodeset) {
        if ($opt_file ne '') {
            print $opt_file . ":\n";
        }
        my $node;
        for $node ($nodeset->get_nodelist) {
            print $node->string_value() . "\n";
        }
    }
}

Here's a sample command line:

% echo "<ns3:abc xmlns:ns3=\"xxx\"><ns3:def>ghi</ns3:def></ns3:abc>" | xpathtext "//def"

I would hope to get "ghi" from this, but I'm currently getting nothing.

Upvotes: 2

Views: 508

Answers (1)

ikegami
ikegami

Reputation: 385496

Wow, that module is buggy.

Let's forget about your question for a minute and use $XML::XPath::Namespaces=1; (the default) for now.

  1. $ perl -E'say q{<r><e>E</e></r>}' |
       xpathtext //e
    E
    

    Correct. There is an e element in the null namespace.

  2. $ perl -E'say q{<r xmlns:p="http://n"><p:e>E</p:e></r>}' |
       xpathtext //e
    [nothing]
    

    Correct. There are no e elements in the null namespace.

  3. $ perl -E'say q{<r xmlns="http://n"><e>E</e></r>}' |
       xpathtext //e
    E
    

    Incorrect. There are no e elements in the null namespace, but one was printed.

  4. $ perl -E'say q{<r><e xmlns="http://n">E</e></r>}' |
       xpathtext //e
    E
    

    Incorrect. There are no e elements in the null namespace, but one was printed.

  5. $ perl -E'say q{<r xmlns:p="http://n"><p:e>E</p:e></r>}' |
       xpathtext //p:e
    E
    

    Incorrect. This should be an error as there's no way of knowing whether p in the XPath refers to the http://n namespace or not.

  6. $ perl -E'say q{<r xmlns="http://n"><e>E</e></r>}' |
       xpathtext //p:e
    [nothing]
    

    Incorrect. This should be an error as there's no way of knowing whether p in the XPath refers to the http://n namespace or not.

Given this level of bugginess, it's no surprise you're having problems.


Now let's find out what $XML::XPath::Namespace=0; does.

After rerunning the above programs with $XML::XPath::Namespaces=0;, we find that the answer is "absolutely nothing".

I've confirmed this by attaching magic to the variable. The variable is never used (in the latest version, XML-XPath-1.13)!

So the module half does what you want, and half does what it should with no apparent means of customising it.

Upvotes: 1

Related Questions