Cindrella
Cindrella

Reputation: 1721

Find value in a html file using HTML::TreeBuilder

Below is my data in html file. I want to find the values in the html file using "HTML::TreeBuilder"

<table id="stats" cellpadding="0" cellspacing="0">
<tbody>
    <tr class="row-even">
        <td class="stats_left">Main Domain</td>
        <td class="stats_right"><b>myabcab.com</b></td>
    </tr>
    <tr class="row-odd">
        <td class="stats_left">Home Directory</td>
        <td class="stats_right">/home/abc</td>
    </tr>
    <tr class="row-even">
        <td class="stats_left">Last login from</td>
        <td class="stats_right">22.32.232.223&nbsp;</td>
    </tr>
    <tr class="row-odd">
        <td class="stats_left">Disk Space Usage</td>
        <td class="stats_right">30.2 / &#8734; MB<br>
        <div class="stats_progress_bar">
        <div class="cpanel_widget_progress_bar" title="0%"
            style="position: relative; width: 100%; height: 100%; padding: 0px; margin: 0px; border: 0px">
        </div>
        <div class="cpanel_widget_progress_bar_percent" style="display: none">0</div>
        </div>
        </td>
    </tr>
    <tr class="row-even">
        <td class="stats_left">Monthly Bandwidth Transfer</td>
        <td class="stats_right">0 / &#8734; MB<br>
        <div class="stats_progress_bar">
        <div class="cpanel_widget_progress_bar" title="0%"
            style="position: relative; width: 100%; height: 100%; padding: 0px; margin: 0px; border: 0px">
        </div>
        <div class="cpanel_widget_progress_bar_percent" style="display: none">0</div>
        </div>
        </td>
    </tr>
</tbody>
  </table>

How can I find "Disk Usage space" value using "HTML::TreeBuilder". I have many tds with same classes from above code,

Upvotes: 0

Views: 4019

Answers (1)

daotoad
daotoad

Reputation: 27183

Find the <td> with the matching content, in this case "Disk Space Usage" and then find the next <td>.

Once you have an element tree:

my $usage = $t->look_down(
    _tag => 'td',
    sub {
        $_[0]->as_trimmed_text() =~ /^Disk Space Usage$/
    }
)->right()->as_trimmed_text();

You may want to wrap that in an eval block in case look_down doesn't find a match.

The tree navigation methods in HTML::Element are a key part of making effective use of HTML::TreeBuilder effectively.


Mohini asks, "why doesn't this work?"

(formatting added by me)

use strict;
use warnings;
use HTML::TreeBuilder;

my $tree = HTML::TreeBuilder->new_from_file( "index.html");
my $disk_value; my $disk_space;

for ( $tree->look_down( _tag => q{tr}, 'class' => 'row-odd' ) ) {

    $disk_space = $tree->look_down(
         _tag => q{td},
         'class' => 'stats_left'
    )->as_trimmed_text;

    if ( $disk_space eq 'Home Directory' ) {
        $disk_value = $tree->look_down( _tag => q{td}, 'class' => 'stats_right' )
                           ->right()
                           ->as_trimmed_text();
    }

}

print STDERR "my home value is $disk_space : $disk_value\n";

look_down starts from the root node you invoke it from, and looks down the element tree (these trees grow upside down) and returns either the list of matching nodes or the first matching node, depending on context.

Since all calls to look down are on tree, you repeatedly find the same nodes each time through the loop.

Your loop should look something more like this:

my %table_stuff;

for my $odd_row ( $tree->look_down( _tag => q{tr}, 'class' => 'row-odd' ) ) {

    $heading = $odd_row->look_down(
         _tag => q{td},
         'class' => 'stats_left'
    );

    $table_stuff{ $heading->as_trimmed_text() } = $heading->right()->as_trimmed_text();
}

This populates a hash with table elements.

If you only want the one value, don't use a loop at all. look_down already acts as a loop.

my $heading = $t->look_down(
    _tag => 'td',
    sub {
        $_[0]->as_trimmed_text() =~ /^Home Directory$/
    }
);

my $value = $heading->right();

#  Now $heading and $value have HTML::Element nodes that you can do whatever you want with.

my $disk_value = $value->as_trimmed_text();
my $disk_space = $heading->as_trimmed_text();

Upvotes: 4

Related Questions