dasen
dasen

Reputation: 377

How do I select a specific sub-node of an XML file using XML::Twig?

I have an XML file:

<?xml version="1.0" encoding="utf-16"?>
<!DOCTYPE tmx SYSTEM "56.dtd">
<body>
<tu changedate="20130625T175037Z"">
  <tuv xml:lang="pt-pt">
    <prop type="x-context-pre">&lt;seg&gt;Some text.&lt;/seg&gt;</prop>
    <prop type="x-context-post">&lt;seg&gt;Other text.&lt;/seg&gt;</prop>
    <seg>The text I'm interested.</seg>
  </tuv>
  <tuv xml:lang="it">
    <seg>And it's translation in italian.</seg>
  </tuv>
 </tu> 

 .... followed by other <tu>'s
</body>

Since it's a huge file, I'm using XML::Twig to parse it and get the parts I'm interested in. I'm particularly interested in seg's node content as well as tu's node attribute.

Here's the code I've got so far:

use 5.010;
use strict;
use warnings;
use XML::Twig;

my $filename = 'filename.tmx';
my $out_filename = 'out.xml';
open my $out, '>', $out_filename;
binmode $out;

my $original_twig = new XML::Twig (pretty_print => 'nsgmls', twig_handlers => {tu =>   \&original_tu});
$original_twig->parsefile($filename);

sub original_tu {
    my($twig, $original_tu) = @_;
    my $original_seg = $original_tu-> first_child('./tuv/seg')->text;
   
}

Perl (or should I say XML::Twig) tells me that I've got:

wrong navigation condition './tuv/seg' ()

Does anyone know how to access the seg node's text and how to access the changedate attribute of tu's node?

Upvotes: 3

Views: 1234

Answers (3)

mirod
mirod

Reputation: 16161

You can't use a complete XPath expression with first_child, just a single XPath step (ie you can only go down 1 level).

To use an XPath expression you need to use findnodes: my $original_seg = $original_tu->findnodes('./tuv/seg', 0)->text (the ,0 gets the first element of the (potential) list of hits.

To access an attribute, use $original_tu->att( 'date')

Upvotes: 1

choroba
choroba

Reputation: 241858

The condition used in first_child cannot use XPath. See https://metacpan.org/module/XML::Twig#cond for details. The method would have been misnamed if it did - first_child returns a child, but seg is a grandchild of tu.

You can use first_descendant('seg') instead.

To access the attribute, use the $original_tu->att('changedate') method.

Upvotes: 0

toolic
toolic

Reputation: 62037

Here is one way to access that node and attribute:

my $original_seg = $original_tu->first_child('tuv')->first_child('seg')->text;
my $date = $original_tu->att('changedate');

Upvotes: 2

Related Questions