Reputation: 10961
I'm trying to use XML::Tidy
in other to indent a XML
file:
sub reformatXML {
#
# the only argument to this function is the file name
#
my $file = $_[ 0 ];
#
# create a new XML::Tidy object from $file
#
my $tidy = XML::Tidy->new( 'filename' => $file );
#
# Tidy up the indenting
#
$tidy->tidy();
#
# write out changes back to the file
#
$tidy->write();
print "$file was reformated.\n";
return
}
sub main(){
#
# get the current directory in which is the
# program running on
#
#my $current_dir = getcwd;
#iterateDir( $current_dir );
my $file = "/path/to/xml/file/autotest.xml";
reformatXML( $file );
}
As simple as that. However, when I call my main()
function I get:
501 Protocol scheme 'd' is not supported d:/UDU/r/tc10.0.0.2012080100_buildA/src/build/kits/tc.dtd
Handler couldn't resolve external entity at line 2, column 29, byte 73
error in processing external entity reference at line 2, column 29, byte 73:
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE kit SYSTEM "tc.dtd">
============================^
<kit>
<contact/>
at C:/xampp/perl/site/lib/XML/Parser.pm line 187
I'm new to Perl
, and I have not idea why is that error. Could someone please help me figuring it out?
The head of the XML
file is:
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE kit SYSTEM "tc.dtd">
<kit>
<contact/>
<description>autotest files</description>
<history>
<hist>06-May-2005 Created</hist>
<hist>17-Jun-2005 Add autotest.jar to rtkit</hist>
<hist>29-Jun-2005 Remove bits picked up elsewhere</
hist>
<hist>15-Jul-2005 Added acad_add_note_types.</hist>
<hist>20-Sep-2005 Add ai stuff</hist>
<hist>31-Oct-2005 DMS BnT fixes</hist>
<hist>03-Nov-2005 Pander to kitting's obsession abo
ut unique filenames</hist>
<hist>17-Nov-2005 Add ics schema and junit</hist>
<hist>09-Dec-2005 add gdt_autotest</hist>
<hist>11-Jan-2006 Merge in P10.0.1.5</hist>
<hist>16-Jan-2006 Merge</hist>
<hist>26-Jan-2006 Need inclass.plmxml to pass tceng
_util autotest</hist>
<hist>06-Mar-2006 Add qdiff.pl</hist>
<hist>09-Mar-2006 Kernel tests need a couple fms cl
ient files</hist>
<hist>10-Mar-2006 Missing dependent library</hist>
<hist>19-Jan-2006 Merged from timb_gmo</hist>
<hist>17-Jan-2006 GMO Kernel Autotests Implementati
Upvotes: 0
Views: 759
Reputation: 385715
XML::Tidy (or rather, one of the modules it uses) appears to expect an absolute path to a file be a valid URL, and it's not. It think the URL specified is
d:/UDU/r/tc10.0.0.2012080100_buildA/src/build/kits/tc.dtd
when it's really
file:///d:/UDU/r/tc10.0.0.2012080100_buildA/src/build/kits/tc.dtd
I don't know how to work around that bug. You could try changing
my $file = "...";
reformatXML($file);
to
my $file = "...";
my $url = URI::file->new($file);
reformatXML($url);
That's the immediate bug. Beyond that, there's the issue that a relative URL is provided to the DTD. That's not necessarily wrong, but it is a bit odd. It means that tc.dtd
must be present in the same directory as autotest.xml
. Is that truly the case?
Some parsers (e.g. XML::LibXML) have an option to avoid fetching DTDs. It's usually unnecessary and thus a waste of time, money, CPU and bandwidth. Look for such an option. It would probably be in the constructor of one of the classes inherited by XML::Tidy.
Upvotes: 2