tester
tester

Reputation: 223

Extracting xml values using perl

I have my xml file which has below data

<User text="HHd5">
         <max string="0"/>
         <min string="pick up"/>
         <valat string="0"/>
         <valon string="0"/>
         <time string="GMT"/>     
 </User>

through my script, i need to check for User text ie. HHd5. If found, i must extract valat and valon values. Please help

My code:

$file = "text.xml" 
$xml = new XML::Simple( KeyAttr => [] );
$data = $xml->XMLin("$file");
my $booklist = XMLin('$file');
foreach my $var ( @{ $booklist->{ User text } } ) {
    if ( $var->{ User text } eq "HHd5" ) { $var->{valat}; $var->{valon}; }

And:

#!/usr/bin/perl 
open( fp, "<", "testing.xml" );
$s = "HHd5";
while (<fp>) {
    $a = $_;
    if ( $a =~ /$s/ ) {
        while (<fp>) {
            $f = $_;
            if ( $f =~ /valon string="(\d+)/ ) { print "valon $1 \n"; }
            if ( $f =~ /valat string="(\d+)/ ) { print "valat $1 \n"; }
        }
    }
}

Upvotes: 0

Views: 3950

Answers (3)

Sobrique
Sobrique

Reputation: 53498

Let me start with a personal peeve. XML is a strict language spec, and it has formal definitions as to what is - and isn't - allowed. Therefore it's actually very easy to parse with a parser, and gets horribly messy if you try and use a hand rolled solution like a regular expression.

Not least because XML can have linefeeds and be reformatted and still be valid.

I would also suggest - don't use XML::Simple. In it's module page:

The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces.

Also - it's really important that you start a script with use strict; and use warnings;. These are really good ways to help diagnose problems and will also get much better responses if you're posting code on Stack Overflow.

With that in mind, I'd suggest picking up XML::Twig which has the ability to set twig_handlers - subroutines that are triggered to process a specific chunk of XML. In the example below - I specify twig_roots which indicates to the parser that I don't really care about anything else.

process_user is called with each User element. We test the User element for it having the appropriate attribute - and if it does, we extract the string attributes from the two subelements you're interested in.

Something like this:

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

sub process_user {
    my ( $twig, $user ) = @_;
    if ( $user->att('text') eq "HHd5" ) {
        print $user->first_child('valat')->att('string'), ":",
            $user->first_child('valon')->att('string');
    }
}

my $parser = XML::Twig->new( twig_roots => { 'User' => \&process_user, } );
$parser->parse( \*DATA );

__DATA__
<User text="HHd5">
         <max string="0"/>
         <min string="pick up"/>
         <valat string="0"/>
         <valon string="0"/>
         <time string="GMT"/>     
 </User>

But simplifying a bit perhaps, to make it similar to your existing code:

use strict;
use warnings;

use XML::Twig;

my $xml_twig = XML::Twig->new();
$xml_twig->parsefile("test.xml");

foreach my $user ( $xml_twig->root->children('User') ) {
    if ( $user->att('text') eq "HHd5" ) {
        print $user ->first_child('valat')->att('string');
        print ":";
        print $user ->first_child('valon')->att('string');
    }
}

(NB: The example above doesn't quite work with your XML snippet, but that's because I'm assuming that User isn't your root node in your XML. It couldn't be really. )

Upvotes: 2

beasy
beasy

Reputation: 1227

to deal with parsing XML, the best way is to download a module from CPAN like XML::Simple. it will be worth your time to get this module, or one like it, and learn how to use it, if you are going to work with XML. these modules basically convert XML into a complex Perl variable (hash reference). manually parsing XML is not advised on a large scale.

however, in the case of a quick ad-hoc situation, you could parse it with regex.

open(my $xml,"<","file.xml");

my ($user, $valat, $valon);
while (my $line = <$xml>) {
    # regexes to capture your variables
}

Upvotes: -1

choroba
choroba

Reputation: 242123

Using XML::XSH2, a wrapper around XML::LibXML:

open file.xml ;
for //User[@text='HHd5']
    echo valat/@string valon/@string ;

Or, a more verbose solution using XML::LibXML only:

#! /usr/bin/perl
use warnings;
use strict;

use XML::LibXML;

my $xml = 'XML::LibXML'->load_xml( location => 'file.xml' );
for my $user ($xml->documentElement->findnodes('//User[@text="HHd5"]')) {
    print $_->{string},"\n" for $user->findnodes('valat | valon');
}

Upvotes: 4

Related Questions