How can I use Perl's XML::LibXML to extract content between tags?

Question

I have an XML file with content like this:

www

How is it possible using XML::LibXML and Perl to take the content between the two nodes, ie "www"?

Thank you.

ikegami · Accepted Answer

The XML format that you have deal with is awful!*

Given a node, you want the nodes that are its siblings, immediately follow it (except perhaps for intermediary comments) and are text nodes.

use strict;
use warnings;
use feature qw( say );

use XML::LibXML qw( XML_COMMENT_NODE XML_TEXT_NODE );

sub following_text {
   my ($node) = @_;
   my $text = '';
   while ($node = $node->nextSibling()) {
      my $node_type = $node->nodeType();
      next if $node_type == XML_COMMENT_NODE;
      last if $node_type != XML_TEXT_NODE;
      $text .= $node->data();   
   }

   return $text;
}

my $parser = XML::LibXML->new();
my $doc    = $parser->parse_fh(\*DATA);
my $root   = $doc->documentElement();
my ($node) = $root->findnodes('//Node[@id="7"]');
my $text   = following_text($node);

say $text;

__DATA__


www

bar

* — www should be a child of Node. For example, www would be better.

How can I use Perl's XML::LibXML to extract content between tags?

Answers (1)

Related Questions

How can I use Perl&#39;s XML::LibXML to extract content between tags?

Answers (1)

Related Questions

How can I use Perl's XML::LibXML to extract content between tags?