Reputation: 25
I've got a page i want to parse that has overlapping tags like this
<div>
<p>
<strong>
<span>sometext</span>
<div> <- this tag is misplaced
</strong>
</p>
<- and should be here
</div>
The problem is there're more p tags to be parsed, but the parser thinks that it reached the end.
I need it to be parsed in the way i can access each p separately
$ar_w = $ar->find('div[itemprop=ar] p');
foreach ($ar_w as $para) {
//something
}
any ideas how to solve this?
Upvotes: 1
Views: 61
Reputation: 943569
Your HTML is invalid.
<div>
inside a <p>
(but since the end tag for <p>
is optional, the <div>
will implicitly end it and then the </p>
will be ignored because there is no matching <p>
).<div>
or inside a <strong>
<div>
start tag without a matching end tagIf you want to recover from the HTML errors in a particular, non-standard way, you'll need to write a custom parser. Pre-built ones tend to follow the HTML rules.
Upvotes: 1