Reputation: 143
If I have HTML of the form
<ol>
<li>Cheeses
<ol>
<li>Red Leicester</li>
<li>Cheddar</li>
</ol>
<li>Wines
<ol>
<li>Burgundy</li>
<li>Beaujolais</li>
</ol>
</ol>
I would like to parse it into a structure something like
{"Cheeses":["Red Leicester", "Cheddar"], "Wines":["Burgundy", "Beaujolais"]}
There are many "tutorials" on how to use modules like HTML::TreeBuilder or Mojo::DOM to parse HTML, but they seem always to rely on using "id=" or "class=" tags. The HTML I want to parse does not have any ID tags or other attributes. How can I do this?
Upvotes: 0
Views: 137
Reputation: 20280
I only have experience in Mojo::DOM, and admittedly you might find a better module for converting your XML to a data structure. If you are using Mojo::DOM, you might want to look at the tree structure underlying the Mojo::DOM object:
#!/usr/bin/env perl
use strict;
use warnings;
use Mojo::DOM;
use Data::Dumper;
my $dom = Mojo::DOM->new(<<'END');
<ol>
<li>Cheeses
<ol>
<li>Red Leicester</li>
<li>Cheddar</li>
</ol>
<li>Wines
<ol>
<li>Burgundy</li>
<li>Beaujolais</li>
</ol>
</ol>
END
print Dumper $dom->tree;
With a little massaging you might be able to get that into the form you want. Perhaps someone has a module that goes a little more directly from HTML (probably actually XML) to the structure.
Upvotes: 1