Developer
Developer

Reputation: 6350

Get a Xml Tag value from regular expression in Perl

I have xml in which i have tag <test>Value</test>.I want to get the value of the tag.I want to do it with Perl Regular Expression Below is my xml sample :

<?xml version="1.0"?>
<t_volume>
<test>Value</test>
<info>
<info_name>FZGA34177.b1</info_name>
<center_project>4085729</center_project>
<base_file>SETARIA_ITALICA/JGI/fasta/FZGA34177.b1.fasta</base_file>
</info>
</t_volume>

I want to get the value of this tag <test>Value</test>.I tried but i am not able to get the value .

$data = ($xml =~/<test>(.*?)<\/test>/i);

In the xml i am getting xml like also

<Test RequestId="1" RequestorId="test" ResponderId="Test">

How could i get the value of RequestorId

Upvotes: 1

Views: 3162

Answers (3)

Sobrique
Sobrique

Reputation: 53498

Regular expressions are a bad idea for use with XML, because regular expressions are not contextual, where XML is. The problem is - that there's a bunch of semantically identical pieces of XML which can be varied legitimately and will trip up a regex. You create brittle code by doing so, because it might one day break because of an upstream (legitimate, within spec) change.

E.g.:

<root>
<Test RequestId="1" RequestorId="test" ResponderId="Test">
</Test>
</root>

Or:

<root>
  <Test RequestId="1" RequestorId="test" ResponderId="Test"></Test>
</root>

Or:

<root>
  <Test
      RequestId="1"
      RequestorId="test"
      ResponderId="Test"></Test>
</root>

Or:

<root
><Test
RequestId="1"
RequestorId="test"
ResponderId="Test"
></Test></root>

Or:

<root>
  <Test RequestId="1" RequestorId="test" ResponderId="Test"/>
</root>

These are all semantically identical, but I'm pretty sure you'd be hard pressed with a regex that safely handles all of the above (and any others that you may run into)

And additionally:

  • A similar match elsewhere in the document tree. (Can be many Test elements)
  • Altering attribute ordering/presence. (so matches don't work any more).
  • A <Test> element that has subelements, that because you're wildcarding, it catches those, rather than attributes.

Fortunately, you have an alternative - xpath - a way of defining an expression, that works a bit like regex, but in an XML aware way.

I would suggest XML::Twig as it doesn't have a particularly steep learning curve. For your first:

#!/usr/bin/env perl

use strict;
use warnings;

use XML::Twig;

my $twig = XML::Twig -> new -> parsefile ( 'your_file.xml' ); 

print $twig -> get_xpath('//test',0) -> text;

For your second:

print $twig -> get_xpath('//Test',0) -> att('RequestorId');

This can one-liner-ify as:

perl -MXML::Twig -0777 -e 'print XML::Twig -> parse ( <> ) -> get_xpath("//test",0) -> text' yourfile

Upvotes: 2

redneb
redneb

Reputation: 23870

The $xml =~/<test>(.*?)<\/test>/i expression can be evaluated in list context in which case it returns an array with all the captured groups. So you need to do something like that:

($data) = $xml =~/<test>(.*?)<\/test>/i;

Edit: For the second example, you can similarly extract the information if you capture it with a set of parentheses:

($RequestorId) = $xml =~ /<Test [^>]*\bRequestorId="([^"]*)"/;

Upvotes: 2

choroba
choroba

Reputation: 241968

Don't use regular expressions to parse XML. Use a proper XML handling tool, i.e. XML::LibXML:

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

use XML::LibXML;

my $dom = 'XML::LibXML'->load_xml( location => shift );

my $data = $dom->findvalue('t_volume/test');
say $data;

my $requestor_id = $dom->findvalue('//Test/@RequestorId');
say $requestor_id;

Upvotes: 2

Related Questions