j1nrg
j1nrg

Reputation: 116

Extracting data from XML tag

I am trying to extract the values from this xml file but can't seem to do it...

<?xml version="1.0" encoding="UTF-8"?>
<personReport xmlns="...">          
    <header>
        <creation>2016-10-15</creation>
    </header>
    <details>
    ...
    </details>
    <person id="person1">
        <personId personIdScheme="name of">joe</personId>
    </person>
    <person id="person2">
        <personId personIdScheme="name of">sam</personId>
    </person>
</personReport>

I am successfully able to extract data from other tags (such as header) using:

my $xml = XMLin($xml_file);
my $header = $xml->{header}
                  ->{creation};

I am trying to do the same thing but get the data (joe) out of <person>...

my $person_type = $xml->{personReport}
                      ->{person1}[1];

Any idea why this isn't working?

Upvotes: 1

Views: 239

Answers (2)

Borodin
Borodin

Reputation: 126752

Pretty much any XML module is superior to XML::Simple, which is anything but simple in use

XML::LibXML and XML::Twig are excellent and popular, and both allow you to address the XML document using XPath expressions. Here's a solution using XML::Twig

use strict;
use warnings 'all';
use feature 'say';

use XML::Twig;

my $twig = XML::Twig->new;
$twig->parsefile( 'personReport.xml' );

say $twig->findvalue('/personReport/header/creation');

for my $person ($twig->findnodes('/personReport/person') ) {

    my $id = $person->att('id');
    my $name = $person->findvalue('personId[@personIdScheme="name of"]');

    say "$id $name";
}

output

2016-10-15
person1 joe
person2 sam

Upvotes: 1

ikegami
ikegami

Reputation: 386541

$xml->{personReport}{person1}[1]

should be

$xml->{person}{person1}{personId}{content}

If you don't understand why, perhaps you shouldn't be using a module so complex that its author discourages its use.

STATUS OF THIS MODULE

The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces. In particular, XML::LibXML is highly recommended and XML::Twig is an excellent alternative.


Finding the name of each person using XML::Simple:

# Assumes each person element will have at least one personId child.
# Assumes each personId element will have a personIdScheme attribute.

for my $person (values(%{ $xml->{person} })) {
   my @data_nodes ref($person->{personIdScheme}) eq 'ARRAY'
      ? @{ $person->{personIdScheme} }
      : $person->{personIdScheme};

   my ($name_data_node) = grep { $_->{personIdScheme} eq 'name' } @data_nodes;

   my $name = $name_data_node->{content};
   ...
}

Finding the name of each person using XML::LibXML:

for my $person_node ($doc->findnodes('/personReport/person')) {
   my $name = $doc->findvalue('personId[@personIdScheme="name of"]', $person_node);
   ...
}

Upvotes: 1

Related Questions