Reputation: 60
I HAVE SOLVED THIS:Turns out the page I was loading with WWW::Mechanize uses AJAX to load all the content that is inside the <tbody>
so it is not loaded when I created the $html variable.
Now I must see how to get this dynamic content...
I am trying to parse through the content of a table in a webpage. The <table>
contains a <thead>
and a <tbody>
. When I go to get the content from the <tbody>
part of the table I find that none of it is there. I only get the content that is inside the <thead>
.
I have tried a few different methods as follow all of which just give me nothing from inside the <tbody>
.
using HTML::TreeBuilder
my $tb = HTML::TreeBuilder->new();
$tb->parse($html);
my $table = $tb->look_down( _tag => 'tbody', id => 'tbody-id' );
using HTML::TableExtract
my $te = HTML::TableExtract->new( attribs => { id => 'table-id' } );
$te->parse($html);
my $table=$te->first_table_found;
when I try to do a print Dumper($table);
of of the table I am showing that I am finding the <table>
and can only see the table content inside of the <thead>
or the <tbody>
and a reference to it's parent that contains all the content from <thead>
.
I could care less about the content in <thead>
I just need the table content out of <tbody>
.
I am not sure what I am doing wrong and where to go from here.
Upvotes: 2
Views: 382
Reputation: 21659
Is the HTML valid? It took me a few minutes to get the following code working because I'd not properly closed one of the tags:
use strict;
use warnings;
use HTML::TreeBuilder;
use Perl6::Say;
my $html = << 'HTML';
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>title</title>
<link rel="stylesheet" href="style.css">
<script src="script.js"></script>
</head>
<body>
<table>
<caption>Caption</caption>
<thead>
<tr>
<th>Header</th>
</tr>
</thead>
<tbody>
<tr>
<td>Body</td>
</tr>
</tbody>
</table>
</body>
</html>
HTML
my $tree = HTML::TreeBuilder->new->parse_content($html);
my $table = $tree->look_down('_tag', 'table');
my $caption = $table->look_down('_tag', 'caption');
my $thead = $table->look_down('_tag', 'thead');
my $tbody = $table->look_down('_tag', 'tbody');
say $caption->as_HTML;
# <caption>Caption</caption>
say $thead->as_HTML;
# <thead><tr><th>Header</th></tr></thead>
say $tbody->as_HTML;
# <tbody><tr><td>Body</td></tr></tbody>
Upvotes: 1