phileas fogg
phileas fogg

Reputation: 1933

Handling multiple XML 'documents' within a single file with Perl

Edited: Sorry, I mistyped 'name' when I mean 'ref' and I've included the complete attributes as well

I have a number of xml files that contain, on a single line, a complete xml document. An example would be:

<Reqeusts>
    <WRRequest><Request domain="foo.com"><Rows><Row includeascolumn="n" interval="hour" ref="time" type="group"/><Row includeascolumn="n"  ref="domain_id" type="group"/><Row />...</Rows><Columns><Column ref="user_id"/><Column ref="country_id"/><Column ref="country_name"/>...</Columns></Request></WRRequest>
.
.
.
</Requests>

There are a number of attributes as well that I'm not including for the sake of clarity.

I'm parsing this using XML::Parser & XML::SimpleObject which work fine up to a point. For instance, I'm just printing out the attributes of each of the elements which works except when I try to print out the 'ref' attribute of the column element. Then I get an "uninitialized variable" error. The code is:

#!/usr/bin/perl
use warnings;
use diagnostics;
use XML::Parser;
use XML::SimpleObject;
use Cwd;


if ($ARGV[0] eq "") {
  die "usage: sumXML.pl <input file> \n";
}

my $fileName = $ARGV[0];

my $parser = new XML::Parser(Style => 'Tree');
my $xso = XML::SimpleObject->new( $parser->parsefile("$fileName") );


foreach my $wrRequest ($xso->child('WRRequests')->children('RWRequest')) {
  print "Client Name: " . $wrRequest->attribute('clientName') . "\n";
foreach my $xmlRequest ($wrRequest->child('REQUEST')) {
  print "Domain name: " . $xmlRequest->attribute('domain') . "\n";
  print "Service: " . $xmlRequest->attribute('service') . "\n";
  foreach my $xmlRow ($xmlRequest->child('ROWS')->children('ROW')) {
    print "Row Reference: " . $xmlRow->attribute('ref') . "\n";
  }
  foreach my $xmlColumn ($xmlRequest->child('COLUMNS')->children('COLUMN')) {
    print "Column Reference: " . $xmlColumn->attribute('ref') . "\n";
  }
 }
  print "\n";
}

Upvotes: 2

Views: 846

Answers (2)

runrig
runrig

Reputation: 6524

I can't know for sure how the data should really be ideally organized, but I find XML::Rules handy in these situations. If you're open to a completely different way of doing it, e.g. (I'm assuming 'ref' is the key in each row, column names should be kept in order and that all you care about is the 'ref' attribute, etc.):

use strict;
use warnings;

use Data::Dumper;
use XML::Rules;

my $xml = <<XML;
<Requests>
  <WRRequest>
    <Request domain="foo.com" service="SomeService">
      <Rows>
        <Row includeascolumn="n" interval="hour" ref="time" type="group"/>
        <Row includeascolumn="n"  ref="domain_id" type="group"/>
      </Rows>
      <Columns>
        <Column ref="user_id"/>
        <Column ref="country_id"/>
        <Column ref="country_name"/>
      </Columns>
    </Request>
  </WRRequest>
</Requests>
XML

my @rules = (
  Request => sub { delete $_[1]->{_content}; print Dumper $_[1]; return },
  Rows    => 'pass no content',
  Columns => 'pass no content',
  Row     => 'no content by ref',
  Column  => sub { '@'.$_[0] => $_[1]{ref} },
);

my $p = XML::Rules->new(
  rules => \@rules,
);
$p->parse($xml);

__END__
$VAR1 = {
          'Column' => [
                      'user_id',
                      'country_id',
                      'country_name'
                    ],
          'domain' => 'foo.com',
          'time' => {
                    'type' => 'group',
                    'includeascolumn' => 'n',
                    'interval' => 'hour'
                  },
          'domain_id' => {
                         'type' => 'group',
                         'includeascolumn' => 'n'
                       },
          'service' => 'SomeService'
        };

Upvotes: 1

vstm
vstm

Reputation: 12537

Your sample data does not parse (even if you remove the dots) so it is not valid XML. I'm not sure how your actual data looks like but this is quite important to find the problem.

I'm certain that there is nothing wrong with XML::Parser or XML::SimpleObject. So please check the following:

  • Do you spell the element/attribute correctly (remember XML is case sensitive)
  • Does the element/attribute actually exist (for example: does every REQUEST-element have a service-attribute? Does every ROW have a ref-attribute?). If they do not exist you have to either reject the input data or deal with the data you have. This of course depends on your requirements.
  • Optional: validate the XML-document-tree against a DTD or XSD to verify the data integrity. This is like the advanced version of the second point.

I have actually taken the time to make it work (by just changing the case of the element-names, and slightly modifying your "sample data"):

use strict;
use warnings;
use XML::Parser;
use XML::SimpleObject;
use Cwd;


my $inXML = join "", <DATA>;
print $inXML;

my $parser = new XML::Parser(Style => 'Tree');
my $xso = XML::SimpleObject->new( $parser->parse($inXML) );


foreach my $wrRequest ($xso->child('Requests')->children('WRRequest')) {
    print "Client Name: " . $wrRequest->attribute('clientName') . "\n";
    foreach my $xmlRequest ($wrRequest->child('Request')) {
        print "Domain name: " . $xmlRequest->attribute('domain') . "\n";
        print "Service: " . $xmlRequest->attribute('service') . "\n";
        foreach my $xmlRow ($xmlRequest->child('Rows')->children('Row')) {
            print "Row Reference: " . $xmlRow->attribute('ref') . "\n";
        }
        foreach my $xmlColumn ($xmlRequest->child('Columns')->children('Column')) {
            print "Column Reference: " . $xmlColumn->attribute('ref') . "\n";
        }
    }
    print "\n";
}


__DATA__
<Requests>
  <WRRequest clientName="foo">
    <Request service="fooService" domain="foo.com">
      <Rows>
        <Row includeascolumn="n" interval="hour" ref="time" type="group"/>
        <Row includeascolumn="n"  ref="domain_id" type="group"/>
      </Rows>
      <Columns>
        <Column ref="user_id"/>
        <Column ref="country_id"/>
        <Column ref="country_name"/>
      </Columns>
    </Request>
  </WRRequest>
</Requests>

Output:

Client Name: foo
Domain name: foo.com
Service: fooService
Row Reference: time
Row Reference: domain_id
Column Reference: user_id
Column Reference: country_id
Column Reference: country_name

I've tested it also with multiple WRRequest-elements (copy&paste) - worked like a charm.

Upvotes: 1

Related Questions