mchr
mchr

Reputation: 6251

Parsing issue with xml: namespace attribute with Perl LibXML

I am trying to parse an XML file with the following contents:

<?xml version="1.0" encoding="UTF-8"?>
<sentences>
<lastmodified>none</lastmodified>
<sentencedefs xml:lang="common">
</sentencedefs>
<sentencedefs xml:lang="en-US">
<baselanguage xml:lang="en-US"/>
</sentencedefs>
</sentences>

The perl code which I use to parse this looks like this (actually this is a cut down version of the key portion of the code):

use 5.006_001;
use strict;
use warnings;
use English '-no_match_vars';
use XML::LibXML;

my $SENTENCEDEFS       = "sentencedefs";
my $LANG               = "lang";

my $lParser = XML::LibXML->new;
my $lSentencesDoc  = $lParser->parse_file("sentences.xml");
my $lSentencesRoot = $lSentencesDoc->documentElement();
my @lSentenceDefs = $lSentencesRoot->getElementsByTagName($SENTENCEDEFS);

foreach my $lDefs (@lSentenceDefs)
{
  my @lAttrs = $lDefs->attributes();
  foreach my $lAttr (@lAttrs)
  {
    print("Attr: " . $lAttr->toString(1) . "\n");
  }

  my $lLang = $lDefs->getAttribute($LANG);
  my $lFound = defined($lLang);
  print("Found $LANG? $lFound \n");
}

I have previously been using LibXML V1.58. I am now testing against LibXML V1.70 and have found that the output is different:

V1.58:

Attr:  xml:lang="common"
Found lang? 1
Attr:  xml:lang="en-US"
Found lang? 1

V1.70:

Attr:  xml:lang="common"
Found lang?
Attr:  xml:lang="en-US"
Found lang?

V1.70 only finds the attribute if I use $LANG="xml:lang".

Can anyone explain why LibXML V1.70 is handling my XML differently? Is there a change I can make to my code to make it behave the same when running with both V1.58 and V1.70? I can't change the XML document.

Upvotes: 2

Views: 664

Answers (1)

ikegami
ikegami

Reputation: 385496

I suspect it has more to do with the version of the underlying libxml2 library, but the behaviour changed because it used to give the the wrong answer. (The element has no attribute named lang in the null namespace.)

The proper call (as defined here) is

$element->getAttributeNS('http://www.w3.org/XML/1998/namespace', 'lang')

I don't have the means to test whether this works in both of your environments. If it doesn't, you could always make the code conditional on

$XML::LibXML::VERSION         # Version of XML::LibXML (e.g. 1.70)

or

XML::LibXML::LIBXML_VERSION   # Version of libxml2 (e.g. 20707 for 2.7.7)

Upvotes: 3

Related Questions