Wanexa
Wanexa

Reputation: 77

Perl XML::Twig - Extract field with similar tag

I'm trying to parse a huge xml file with some similar tag. For the moment I can only parse the first tag and the first_child

Here is an example of the xml:

<?xml version="1.0" encoding="UTF-8"?>
<test version="1.0">
  <parameters/>
  <category name="z1" description="jobs currently running" count="30" timestamp="2010-01-16T14:24:31">
    <jobs name="ZEI018CL" owner="A" type="auto" activityLevel="147" threadId="202" pid="20521" vmName="[email protected]:6102:xxx" cpuUsage="0"/>
    <job name="ZUA002B" owner="A" type="auto" activityLevel="3375" threadId="194" pid="20521" vmName="[email protected]:6102:xxx" cpuUsage="0"/>
    <job name="ZZZ855" owner="A" type="auto" activityLevel="0" threadId="107" pid="20457" vmName="[email protected]:6101:xxx" cpuUsage="0"/>
    <job name="ZKA019CL" owner="A" type="auto" activityLevel="0" threadId="105" pid="20457" vmName="[email protected]:6101:xxx" cpuUsage="0"/>
    <job name="ZIN41B" owner="A" type="auto" activityLevel="3" threadId="104" pid="20457" vmName="[email protected]:6101:xxx" cpuUsage="0"/>
    <job name="ZIN198CL" owner="A" type="auto" activityLevel="0" threadId="103" pid="20457" vmName="[email protected]:6101:xxx" cpuUsage="0"/>
    <job name="ZHO060" owner="A" type="auto" activityLevel="61" threadId="102" pid="20457" vmName="[email protected]:6101:xxx" cpuUsage="0"/>
    <job name="ZEI019CL" owner="A" type="auto" activityLevel="0" threadId="101" pid="20457" vmName="[email protected]:6101:xxx" cpuUsage="0"/>
    <job name="ZEI013CL" owner="A" type="auto" activityLevel="0" threadId="99" pid="20457" vmName="[email protected]:6101:xxx" cpuUsage="0"/>
    <job name="ZEI011CL" owner="A" type="auto" activityLevel="0" threadId="98" pid="20457" vmName="[email protected]:6101:xxx" cpuUsage="0"/>
    <job name="ZEC007CL" owner="A" type="auto" activityLevel="0" threadId="97" pid="20457" vmName="[email protected]:6101:xxx" cpuUsage="0"/>
    <job name="ZEC001B" owner="A" type="auto" activityLevel="2" threadId="96" pid="20457" vmName="[email protected]:6101:xxx" cpuUsage="0"/></category>
 <category name="z3" description="Batchjobs" count="0" timestamp="2015-01-16T14:24:31"/>
  <category name="z4" description="Interactivejobs jobs currently running in the system" count="498" timestamp="2015-01-16T14:24:31">
    <job name="CAS" owner="PA" type="interactive" activityLevel="0" threadId="14624" pid="23771" vmName="[email protected]:6104:xxx" cpuUsage="0"/>
    <job name="CR" owner="K" type="interactive" activityLevel="0" threadId="14586" pid="23771" vmName="[email protected]:6104:xxx" cpuUsage="0"/>
    <job name="MM" owner="DU" type="interactive" activityLevel="0" threadId="14570" pid="23771" vmName="[email protected]:6104:xxx" cpuUsage="0"/>
    <job name="ZZ" owner="D" type="interactive" activityLevel="0" threadId="14568" pid="23771" vmName="[email protected]:6104:xxx" cpuUsage="0"/></category>
 <category name="services" description="The status" timestamp="2015-01-16T14:24:31">
    <service name="1" description="test1" port-status="up" thread-status="up"/>
    <service name="2" description="test2" port-status="up" thread-status="up"/>
    <service name="3" description="test3" port-status="N/A" thread-status="up"/>
    <service name="4" description="test4" port-status="up" thread-status="up"/></category></test>

For the first line I

my $parser = XML::Twig->new();
$parser->parsefile($xml);

For the first line I use

my $count = $parser->root->first_child('category')->att('count');
print $count;

For the next line this one

my $service = $parser->root->first_child('category')->first_child('job')->att('name');
print $service;

But I can't figure out how to get the port-status for a specific name like:

Or for a specific job name the type in the 2nd tag.

Can you help me ?

Upvotes: 0

Views: 434

Answers (2)

Patrick J. S.
Patrick J. S.

Reputation: 2935

My guess is you want something like this:

foreach ($parser->root->children('section[@name="1"]')){
  print join ", ", @{$_->atts}{'port-status', 'thread-status'}
}

with children('section[@name="1"]') you get all section elements whose name attribute is 1.

Then you ask with the atts method for hash reference of that element and extract port-status and thread-status

Edit: sorry fixed, forgot that you'll get more than one with children.

Upvotes: 0

mirod
mirod

Reputation: 16171

In your case the easiest is probably to use XPath to get what you want:

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig::XPath;

my( $service, $infile)= @ARGV;

my $t= XML::Twig->new()
                ->parsefile( $infile);

# get the service first, then the attribute
# note the \@'s, where Perl and XPath syntaxes collide
my @services= $t->findnodes( qq{//service[\@name="$service"]});
my $status= $services[0]->att( 'port-status');
print "status: $status\n";

# get it in one swell XPath query
my $status2= $t->findvalue( qq{//service[\@name="$service"]/\@port-status});
print "status: $status2\n";

If your XML file is really huge, and depending on what you need to do, there may be better alternatives though, using handlers. It's hard to tell from your example.

Upvotes: 1

Related Questions