Reputation: 6350
I have xml in which i have tag <test>Value</test>
.I want to get the value of the tag.I want to do it with Perl Regular Expression
Below is my xml sample :
<?xml version="1.0"?>
<t_volume>
<test>Value</test>
<info>
<info_name>FZGA34177.b1</info_name>
<center_project>4085729</center_project>
<base_file>SETARIA_ITALICA/JGI/fasta/FZGA34177.b1.fasta</base_file>
</info>
</t_volume>
I want to get the value of this tag <test>Value</test>
.I tried but i am not able to get the value .
$data = ($xml =~/<test>(.*?)<\/test>/i);
In the xml i am getting xml like also
<Test RequestId="1" RequestorId="test" ResponderId="Test">
How could i get the value of RequestorId
Upvotes: 1
Views: 3162
Reputation: 53498
Regular expressions are a bad idea for use with XML, because regular expressions are not contextual, where XML is. The problem is - that there's a bunch of semantically identical pieces of XML
which can be varied legitimately and will trip up a regex
. You create brittle code by doing so, because it might one day break because of an upstream (legitimate, within spec) change.
E.g.:
<root>
<Test RequestId="1" RequestorId="test" ResponderId="Test">
</Test>
</root>
Or:
<root>
<Test RequestId="1" RequestorId="test" ResponderId="Test"></Test>
</root>
Or:
<root>
<Test
RequestId="1"
RequestorId="test"
ResponderId="Test"></Test>
</root>
Or:
<root
><Test
RequestId="1"
RequestorId="test"
ResponderId="Test"
></Test></root>
Or:
<root>
<Test RequestId="1" RequestorId="test" ResponderId="Test"/>
</root>
These are all semantically identical, but I'm pretty sure you'd be hard pressed with a regex
that safely handles all of the above (and any others that you may run into)
And additionally:
Test
elements)<Test>
element that has subelements, that because you're wildcarding, it catches those, rather than attributes. Fortunately, you have an alternative - xpath
- a way of defining an expression, that works a bit like regex
, but in an XML
aware way.
I would suggest XML::Twig
as it doesn't have a particularly steep learning curve. For your first:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig -> new -> parsefile ( 'your_file.xml' );
print $twig -> get_xpath('//test',0) -> text;
For your second:
print $twig -> get_xpath('//Test',0) -> att('RequestorId');
This can one-liner-ify as:
perl -MXML::Twig -0777 -e 'print XML::Twig -> parse ( <> ) -> get_xpath("//test",0) -> text' yourfile
Upvotes: 2
Reputation: 23870
The $xml =~/<test>(.*?)<\/test>/i
expression can be evaluated in list context in which case it returns an array with all the captured groups. So you need to do something like that:
($data) = $xml =~/<test>(.*?)<\/test>/i;
Edit: For the second example, you can similarly extract the information if you capture it with a set of parentheses:
($RequestorId) = $xml =~ /<Test [^>]*\bRequestorId="([^"]*)"/;
Upvotes: 2
Reputation: 241968
Don't use regular expressions to parse XML. Use a proper XML handling tool, i.e. XML::LibXML:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
use XML::LibXML;
my $dom = 'XML::LibXML'->load_xml( location => shift );
my $data = $dom->findvalue('t_volume/test');
say $data;
my $requestor_id = $dom->findvalue('//Test/@RequestorId');
say $requestor_id;
Upvotes: 2