Reputation: 649
I have the following XML file sheetX.xml
(taken from an Excel XML sheet file):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"
xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2"
xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3"
mc:Ignorable="x14ac xr xr2 xr3"
xr:uid="{109BF357-4A9A-4969-B57D-8A2B0130DC3F}">
<dimension ref="A1"/>
<sheetViews>
<sheetView tabSelected="1" topLeftCell="M1" workbookViewId="0">
<selection activeCell="A1" sqref="A1"/>
</sheetView>
</sheetViews>
<sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/>
<sheetData/>
<pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
</worksheet>
I am reading the file with the XML::LibXML Perl module
use strict;
use warnings;
use XML::LibXML;
use XML::LibXML::Reader;
my $reader = XML::LibXML::Reader->new( location => sheetX.xml);
$reader->read();
while($NERROR1==0){
my $doc = $reader->copyCurrentNode(1);
if(!defined $doc){
$NERROR1=-1;
} else {
if($reader->attributeCount()>0){
print "tag name:" . $reader->name() . "\n";
my @attributelist = $doc->attributes();
for my $iAtt (0 .. scalar @attributelist-1){
print "Att name:" . $attributelist[$iAtt]->nodeName() . "\n";
print "Att value:" . $attributelist[$iAtt]->value . "\n";
}
}
$reader->nextElement();
}
}
$reader->close();
The output for some of the tags from the perl module are:
tag name:worksheet
Att name:mc:Ignorable
Att value:x14ac xr xr2 xr3
Att name:xr:uid
Att value:{00000000-0001-0000-0400-000000000000}
Att name:xmlns
Att value:http://schemas.openxmlformats.org/spreadsheetml/2006/main
Att name:xmlns:mc
Att value:http://schemas.openxmlformats.org/markup-compatibility/2006
Att name:xmlns:r
Att value:http://schemas.openxmlformats.org/officeDocument/2006/relationships
Att name:xmlns:x14ac
Att value:http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac
Att name:xmlns:xr
Att value:http://schemas.microsoft.com/office/spreadsheetml/2014/revision
Att name:xmlns:xr2
Att value:http://schemas.microsoft.com/office/spreadsheetml/2015/revision2
Att name:xmlns:xr3
Att value:http://schemas.microsoft.com/office/spreadsheetml/2016/revision3
and
tag name:sheetView
Att name:tabSelected
Att value:1
Att name:topLeftCell
Att value:M1
Att name:workbookViewId
Att value:0
Att name:xmlns
Att value:http://schemas.openxmlformats.org/spreadsheetml/2006/main
and
tag name:sheetFormatPr
Att name:defaultRowHeight
Att value:15
Att name:x14ac:dyDescent
Att value:0.25
Att name:xmlns
Att value:http://schemas.openxmlformats.org/spreadsheetml/2006/main
Att name:xmlns:x14ac
Att value:http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac
So, basically, the code is printing out xmlns
attributes that are not shown in the XML file for the sheetView
and sheetFormatPr
tags, but the worksheet
tag has all the attributes that are shown in the file and no extra ones.
At some stage I will need to reconstruct the XML file from the data generated by my perl program (the program also prints out tags, values, etc.). So my question: Is there any way to get my perl program to print out the tags that are displayed in the XML file and not the other ones that are not displayed?
Upvotes: 2
Views: 360
Reputation: 17031
Here is the minimal set of changes I know of to exclude the xmlns
attributes. Note changed lines marked ###
. I am not sure what your other code may be doing with $NERROR1
. I removed it here for simplicity. Most of this is adapted from the docs.
use strict;
use warnings;
use XML::LibXML;
use XML::LibXML::Reader;
my $reader = XML::LibXML::Reader->new( location => 'foo.xml' );
$reader->read();
###my $NERROR1; # Needed to add this because of `use strict`
###while($NERROR1==0){
while($reader->read) { ### Per the docs.
my $node = $reader->copyCurrentNode(1); ### Might not be a document, so $node instead of $doc
### if(!defined $doc){
### $NERROR1=-1;
### } else {
if($reader->attributeCount>0){
print "tag name:" . $reader->name . "\n";
### my @attributelist = $doc->attributes();
### for my $iAtt (0 .. scalar @attributelist-1){
for my $att ($node->attributes) { ### Simpler form of the loop --- don't need the indices.
next if $att->nodeName =~ /^xmlns\b/; ### <== The key - skip to the next attribute if this one starts with "xmlns"
print "Att name:" . $att->nodeName . "\n";
print "Att value:" . $att->value . "\n";
}
}
### $reader->nextElement();
### }
}
$reader->close();
Output
tag name:dimension
Att name:ref
Att value:A1
tag name:sheetView
Att name:tabSelected
Att value:1
Att name:topLeftCell
Att value:M1
Att name:workbookViewId
Att value:0
tag name:selection
Att name:activeCell
Att value:A1
Att name:sqref
Att value:A1
tag name:sheetFormatPr
Att name:defaultRowHeight
Att value:15
Att name:x14ac:dyDescent
Att value:0.25
tag name:pageMargins
Att name:left
Att value:0.7
Att name:right
Att value:0.7
Att name:top
Att value:0.75
Att name:bottom
Att value:0.75
Att name:header
Att value:0.3
Att name:footer
Att value:0.3
I found a PerlMonks thread that links to RFC 4918, p. 40, which clarifies that
Since the "xmlns" attribute does not contain a prefix, the namespace applies by default to all enclosed elements.
In this case, the <worksheet>
tag declares the default namespace xmlns="http://schemas...2006/main"
. That applies to the contained elements, so the <sheetView>
and <sheetFormatPr>
tags inside <worksheet>
also have that default namespace. XML::LibXML::Reader is giving you access to that information by reporting an xmlns
attribute on those nodes.
Upvotes: 3