javed
javed

Reputation: 447

perl: how to parse an xml file sequentially

I have an XML file which describes the data-structure that I can exchange on a UDP channel. For example: Here is my input XML file describing my data-structure.

<ds>
 <uint32 name='a'/>
 <uint32 name='b'/>
 <string name='c'/>
 <int16 name='d'/>
 <uint32 name='e'/>
</ds>

Parsing this XML file using perl's XML:Simple, allows me to generate the following hash

$VAR1 = {
          'uint32' => {
                      'e' => {},
                      'a' => {},
                      'b' => {}
                    },
          'int16' => {
                     'name' => 'd'
                   },
          'string' => {
                      'name' => 'c'
                    }
        };

As you can see, after parsing I have no way to figure out what will be the relative position of field 'e' relative to the start of the datastructure.

I would like to find out offsets of each of these elements.

I tried searching for a perl XML parser which allows me to parse an XML file sequentially, something like a 'getnexttag()' kind of a functionality, but could not find any.

What is the best way to do this programmatically? If not perl, then which other language is best suited to do this work?

Upvotes: 4

Views: 2385

Answers (3)

draegtun
draegtun

Reputation: 22560

Here is an example using XML::Twig

use XML::Twig;

XML::Twig->new( twig_handlers => { 'ds/*' => \&each_child } )
         ->parse( $your_xml_data );

sub each_child {
    my ($twig, $child) = @_;
    printf "tag %s : name = %s\n", $child->name, $child->{att}->{name};
}

This outputs:

tag uint32 : name = a
tag uint32 : name = b
tag string : name = c
tag int16 : name = d
tag uint32 : name = e

Upvotes: 1

Zaid
Zaid

Reputation: 37146

It most certainly is possible with Perl.

Here's an example with XML::LibXML :

use strict;
use warnings;
use feature 'say';
use XML::LibXML;

my $xml = XML::LibXML->load_xml( location => 'test.xml' );

my ( $dsNode ) = $xml->findnodes( '/ds' );

my @kids = $dsNode->nonBlankChildNodes;     # The indices of this array will
                                            # give the offset

my $first_kid = shift @kids;                # Pull off the first kid
say $first_kid->toString;                   # "<uint32 name='a'/>"

my $second = $first_kid->nextNonBlankSibling();     
my $third  = $second->nextNonBlankSibling();

say $third->toString;                       # "<string name="c"/>"

Upvotes: 2

Filip Ros&#233;en
Filip Ros&#233;en

Reputation: 63807

You'll need to use a streaming parser with the appropriate callbacks, this will also improve parsing speed (and give you less memory consumption, if done correctly) when it comes to larger sets of data, which is a good/awesome thing.

I recommend you to use XML::SAX, an introducation to the module is available under the following link:

Provide callbacks for start_element, this way you can read the value of each element one at a time.


Could you write me an easy example?

Yes, and I already have! ;-)

The below snippet will parse the data OP provided and print the name of each element, as well as the attributes key/value.

It should be quite easy to comprehend but if you got any questions feel free to add them as a comment and I'll update this post with more detailed information.

use warnings;
use strict;

use XML::SAX;

my $parser = XML::SAX::ParserFactory->parser(
  Handler => ExampleHandler->new
);

$parser->parse_string (<<EOT
<ds>
  <uint32 name='a'/>
  <uint32 name='b'/>
  <string name='c'/>
  <int16 name='d'/>
  <uint32 name='e'/>
</ds>
EOT
);

# # # # # # # # # # # # # # # # # # # # # # # #

package ExampleHandler;

use base ('XML::SAX::Base');

sub start_element {
  my ($self, $el) = @_;

  print "found element: ", $el->{Name}, "\n";

  for my $attr (values %{$el->{Attributes}}) {
    print "  '", $attr->{Name}, "' = '", $attr->{Value}, "'\n";
  }

  print "\n";
}

output

found element: ds

found element: uint32
  'name' = 'a'

found element: uint32
  'name' = 'b'

found element: string
  'name' = 'c'

found element: int16
  'name' = 'd'

found element: uint32
  'name' = 'e'

I'm not satisfied with XML::SAX, are there any other modules available?

Yes, there are plenty to choose from. Read the following list and choose the one that you find fitting for your specific problem:


What is the difference between different methods of parsing?

I also recommend reading the following FAQ regarding XML-parsing. It will bring up the Pro's and Con's of using a tree-parser (such as XML::Parser::Simple) or a streaming parser:

Upvotes: 3

Related Questions