cybertextron
cybertextron

Reputation: 10961

error during xml parsing - Perl

I'm trying to use XML::Tidy in other to indent a XML file:

sub reformatXML {
    #
    # the only argument to this function is the file name
    #
    my $file = $_[ 0 ];
    #
    # create a new XML::Tidy object from $file
    #
    my $tidy = XML::Tidy->new( 'filename' => $file );
    #
    # Tidy up the indenting
    #
    $tidy->tidy();
    #
    # write out changes back to the file
    #
    $tidy->write();
    print "$file was reformated.\n";
    return
}

sub main(){
    #
    # get the current directory in which is the 
    # program running on
    #
    #my $current_dir = getcwd;
    #iterateDir( $current_dir );
    my $file = "/path/to/xml/file/autotest.xml";
    reformatXML( $file );
}

As simple as that. However, when I call my main() function I get:

501 Protocol scheme 'd' is not supported d:/UDU/r/tc10.0.0.2012080100_buildA/src/build/kits/tc.dtd
Handler couldn't resolve external entity at line 2, column 29, byte 73
error in processing external entity reference at line 2, column 29, byte 73:
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE kit SYSTEM "tc.dtd">
============================^
<kit>
  <contact/>
 at C:/xampp/perl/site/lib/XML/Parser.pm line 187

I'm new to Perl, and I have not idea why is that error. Could someone please help me figuring it out?

The head of the XML file is:

 <?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE kit SYSTEM "tc.dtd">
<kit>
  <contact/>
  <description>autotest files</description>
  <history>
    <hist>06-May-2005                  Created</hist>
    <hist>17-Jun-2005            Add autotest.jar to rtkit</hist>
    <hist>29-Jun-2005            Remove bits picked up elsewhere</
hist>
    <hist>15-Jul-2005            Added acad_add_note_types.</hist>
    <hist>20-Sep-2005            Add ai stuff</hist>
    <hist>31-Oct-2005            DMS BnT fixes</hist>
    <hist>03-Nov-2005            Pander to kitting's obsession abo
ut unique filenames</hist>
    <hist>17-Nov-2005            Add ics schema and junit</hist>
    <hist>09-Dec-2005            add gdt_autotest</hist>
    <hist>11-Jan-2006            Merge in P10.0.1.5</hist>
    <hist>16-Jan-2006      Merge</hist>
    <hist>26-Jan-2006      Need inclass.plmxml to pass tceng
_util autotest</hist>
    <hist>06-Mar-2006      Add qdiff.pl</hist>
    <hist>09-Mar-2006      Kernel tests need a couple fms cl
ient files</hist>
    <hist>10-Mar-2006      Missing dependent library</hist>
    <hist>19-Jan-2006      Merged from timb_gmo</hist>
    <hist>17-Jan-2006      GMO Kernel Autotests Implementati

Upvotes: 0

Views: 759

Answers (1)

ikegami
ikegami

Reputation: 385715

XML::Tidy (or rather, one of the modules it uses) appears to expect an absolute path to a file be a valid URL, and it's not. It think the URL specified is

 d:/UDU/r/tc10.0.0.2012080100_buildA/src/build/kits/tc.dtd

when it's really

 file:///d:/UDU/r/tc10.0.0.2012080100_buildA/src/build/kits/tc.dtd

I don't know how to work around that bug. You could try changing

my $file = "...";
reformatXML($file);

to

my $file = "...";
my $url = URI::file->new($file);
reformatXML($url);

That's the immediate bug. Beyond that, there's the issue that a relative URL is provided to the DTD. That's not necessarily wrong, but it is a bit odd. It means that tc.dtd must be present in the same directory as autotest.xml. Is that truly the case?


Some parsers (e.g. XML::LibXML) have an option to avoid fetching DTDs. It's usually unnecessary and thus a waste of time, money, CPU and bandwidth. Look for such an option. It would probably be in the constructor of one of the classes inherited by XML::Tidy.

Upvotes: 2

Related Questions