Chazg76
Chazg76

Reputation: 649

LibXML: "xmlns" attribute being reported but not in XML input file

I have the following XML file sheetX.xml (taken from an Excel XML sheet file):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" 
           xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
           xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
           xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"
           xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
           xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2"
           xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3"
           mc:Ignorable="x14ac xr xr2 xr3"
           xr:uid="{109BF357-4A9A-4969-B57D-8A2B0130DC3F}">
  <dimension ref="A1"/>
  <sheetViews>
    <sheetView tabSelected="1" topLeftCell="M1" workbookViewId="0">
      <selection activeCell="A1" sqref="A1"/>
    </sheetView>
  </sheetViews>
  <sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/>  
  <sheetData/>
  <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
</worksheet>

I am reading the file with the XML::LibXML Perl module

use strict;
use warnings;
use XML::LibXML;
use XML::LibXML::Reader;

my $reader = XML::LibXML::Reader->new( location => sheetX.xml);
$reader->read();
while($NERROR1==0){
    my $doc = $reader->copyCurrentNode(1);
    if(!defined $doc){
        $NERROR1=-1;
    } else {
        if($reader->attributeCount()>0){
            print "tag name:" . $reader->name() . "\n";
            my @attributelist = $doc->attributes();
            for my $iAtt (0 .. scalar @attributelist-1){
                print "Att name:" . $attributelist[$iAtt]->nodeName() . "\n";
                print "Att value:" . $attributelist[$iAtt]->value . "\n";
            }
        }
        $reader->nextElement();
    }
}
$reader->close();

The output for some of the tags from the perl module are:

tag name:worksheet
Att name:mc:Ignorable
Att value:x14ac xr xr2 xr3
Att name:xr:uid
Att value:{00000000-0001-0000-0400-000000000000}
Att name:xmlns
Att value:http://schemas.openxmlformats.org/spreadsheetml/2006/main
Att name:xmlns:mc
Att value:http://schemas.openxmlformats.org/markup-compatibility/2006
Att name:xmlns:r
Att value:http://schemas.openxmlformats.org/officeDocument/2006/relationships
Att name:xmlns:x14ac
Att value:http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac
Att name:xmlns:xr
Att value:http://schemas.microsoft.com/office/spreadsheetml/2014/revision
Att name:xmlns:xr2
Att value:http://schemas.microsoft.com/office/spreadsheetml/2015/revision2
Att name:xmlns:xr3
Att value:http://schemas.microsoft.com/office/spreadsheetml/2016/revision3

and

tag name:sheetView
Att name:tabSelected
Att value:1
Att name:topLeftCell
Att value:M1
Att name:workbookViewId
Att value:0
Att name:xmlns
Att value:http://schemas.openxmlformats.org/spreadsheetml/2006/main

and

tag name:sheetFormatPr
Att name:defaultRowHeight
Att value:15
Att name:x14ac:dyDescent
Att value:0.25
Att name:xmlns
Att value:http://schemas.openxmlformats.org/spreadsheetml/2006/main
Att name:xmlns:x14ac
Att value:http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac

So, basically, the code is printing out xmlns attributes that are not shown in the XML file for the sheetView and sheetFormatPr tags, but the worksheet tag has all the attributes that are shown in the file and no extra ones.

At some stage I will need to reconstruct the XML file from the data generated by my perl program (the program also prints out tags, values, etc.). So my question: Is there any way to get my perl program to print out the tags that are displayed in the XML file and not the other ones that are not displayed?

Upvotes: 2

Views: 360

Answers (1)

cxw
cxw

Reputation: 17031

Here is the minimal set of changes I know of to exclude the xmlns attributes. Note changed lines marked ###. I am not sure what your other code may be doing with $NERROR1. I removed it here for simplicity. Most of this is adapted from the docs.

use strict;
use warnings;
use XML::LibXML;
use XML::LibXML::Reader;

my $reader = XML::LibXML::Reader->new( location => 'foo.xml' );
$reader->read();

###my $NERROR1;              # Needed to add this because of `use strict`
###while($NERROR1==0){
while($reader->read) {       ### Per the docs.
    my $node = $reader->copyCurrentNode(1);    ### Might not be a document, so $node instead of $doc
###    if(!defined $doc){
###        $NERROR1=-1;
###    } else {
        if($reader->attributeCount>0){
            print "tag name:" . $reader->name . "\n";
###            my @attributelist = $doc->attributes();
###            for my $iAtt (0 .. scalar @attributelist-1){
            for my $att ($node->attributes) {           ### Simpler form of the loop --- don't need the indices.
                next if $att->nodeName =~ /^xmlns\b/;   ### <== The key - skip to the next attribute if this one starts with "xmlns"
                print "Att name:" . $att->nodeName . "\n";
                print "Att value:" . $att->value . "\n";
            }
        }
###        $reader->nextElement();
###    }
}
$reader->close();

Output

tag name:dimension
Att name:ref
Att value:A1
tag name:sheetView
Att name:tabSelected
Att value:1
Att name:topLeftCell
Att value:M1
Att name:workbookViewId
Att value:0
tag name:selection
Att name:activeCell
Att value:A1
Att name:sqref
Att value:A1
tag name:sheetFormatPr
Att name:defaultRowHeight
Att value:15
Att name:x14ac:dyDescent
Att value:0.25
tag name:pageMargins
Att name:left
Att value:0.7
Att name:right
Att value:0.7
Att name:top
Att value:0.75
Att name:bottom
Att value:0.75
Att name:header
Att value:0.3
Att name:footer
Att value:0.3

Explanation

I found a PerlMonks thread that links to RFC 4918, p. 40, which clarifies that

Since the "xmlns" attribute does not contain a prefix, the namespace applies by default to all enclosed elements.

In this case, the <worksheet> tag declares the default namespace xmlns="http://schemas...2006/main". That applies to the contained elements, so the <sheetView> and <sheetFormatPr> tags inside <worksheet> also have that default namespace. XML::LibXML::Reader is giving you access to that information by reporting an xmlns attribute on those nodes.

Upvotes: 3

Related Questions