Reputation: 35
I read many questions and many answers but I couldn't find a straight answer to my question. All the answers were either very general or different from what I want to do. I got so far that i need to use HTML::TableExtract or HTML::TreeBuilder::XPath but I couldn't really use them to store the values. I could somehow get table row values and show them with Dumper.
Something like this:
foreach my $ts ($tree->table_states) {
foreach my $row ($ts->rows) {
push (@fir , (Dumper $row));
} }
print @sec;
But this is not really doing what I'm looking for. I will add the structure of the HTML table that I want to store the values:
<table><caption><b>Table 1 </b>bla bla bla</caption>
<tbody>
<tr>
<th ><p>Foo</p>
</th>
<td ><p>Bar</p>
</td>
</tr>
<tr>
<th ><p>Foo-1</p>
</th>
<td ><p>Bar-1</p>
</td>
</tr>
<tr>
<th ><p>Formula</p>
</th>
<td><p>Formula1-1</p>
<p>Formula1-2</p>
<p>Formula1-3</p>
<p>Formula1-4</p>
<p>Formula1-5</p>
</td>
</tr>
<tr>
<th><p>Foo-2</p>
</th>
<td ><p>Bar-2</p>
</td>
</tr>
<tr>
<th ><p>Foo-3</p>
</th>
<td ><p>Bar-3</p>
<p>Bar-3-1</p>
</td>
</tr>
</tbody>
</table>
It would be convenient if I can store the row values as pairs together.
expected output would be something like an array with values of: (Foo , Bar , Foo-1 , Bar-1 , Formula , Formula-1 Formula-2 Formula-3 Formula-4 Formula-5 , ....) The important thing for me is to learn how to store the values of each tag and how to move around in the tag tree.
Upvotes: 1
Views: 201
Reputation: 39158
Learn XPath and DOM manipulation.
use strictures;
use HTML::TreeBuilder::XPath qw();
my $dom = HTML::TreeBuilder::XPath->new;
$dom->parse_file('10280979.html');
my %extract;
@extract{$dom->findnodes_as_strings('//th')} =
map {[$_->findvalues('p')]} $dom->findnodes('//td');
__END__
# %extract = (
# Foo => [qw(Bar)],
# 'Foo-1' => [qw(Bar-1)],
# 'Foo-2' => [qw(Bar-2)],
# 'Foo-3' => [qw(Bar-3 Bar-3-1)],
# Formula => [qw(Formula1-1 Formula1-2 Formula1-3 Formula1-4 Formula1-5)],
# )
Upvotes: 3