Use HTML::TreeBuilder in Perl to extract all instances of a specific span class

Question

Trying to make a Perl script to open an HTML file and extract anything contained within tags.

Sample HTML:


   
      
         >>
         
               Test1!AAAAAAAA  08/01/03(Thu)02:06  No.2    
 File: 1199326003295.jpg -(65843 B, 288x412) Thumbnail displayed, click image for full size.
       
            
               Test message 1
            
         
      
   


   
      
         >>
         
                Test2!BBBBBBBB 08/01/03(Thu)16:12  No.5    
            
               Test message 2
            
         
      
   


   
      
         >>
         
                Test3!CCCCCCCC. 08/01/01(Tue)17:53  No.7    
            
               Test message 3

Desired output:

!AAAAAAAA
!BBBBBBBB
!CCCCCCCC

Current script:

#!/usr/bin/env perl

use warnings;
use strict;
use 5.010;

use HTML::TreeBuilder;


open(my $html, "<", "temp.html")
        or die "Can't open";


my $tree = HTML::TreeBuilder->new();
$tree->parse_file($html);


foreach my $e ($tree->look_down('class', 'reply')) {
    my $e = $tree->look_down('class', 'postertrip');
    say $e->as_text;
}

Bad output of script:

!AAAAAAAA
!AAAAAAAA
!AAAAAAAA

Georg Mavridis · Accepted Answer

in your foreach-loop you have to look down from the element you found. So the correct code is:

foreach my $parent ($tree->look_down('class', 'reply')) {
    my $e = $parent->look_down('class', 'postertrip');
    say $e->as_text;
}

Use HTML::TreeBuilder in Perl to extract all instances of a specific span class

Answers (2)

Related Questions