Dembele
Dembele

Reputation: 25

PHP simple HTML DOM tag overlapping

I've got a page i want to parse that has overlapping tags like this

 <div>
  <p>
   <strong>
    <span>sometext</span>
     <div> <- this tag is misplaced
   </strong>
  </p>
       <- and should be here
     </div>

The problem is there're more p tags to be parsed, but the parser thinks that it reached the end.

I need it to be parsed in the way i can access each p separately

$ar_w = $ar->find('div[itemprop=ar] p');
    foreach ($ar_w as $para) {
        //something
    }

any ideas how to solve this?

Upvotes: 1

Views: 61

Answers (1)

Quentin
Quentin

Reputation: 943569

Your HTML is invalid.

  • You cannot put a <div> inside a <p> (but since the end tag for <p> is optional, the <div> will implicitly end it and then the </p> will be ignored because there is no matching <p>).
  • You cannot put a <div> or inside a <strong>
  • You cannot have a <div> start tag without a matching end tag

If you want to recover from the HTML errors in a particular, non-standard way, you'll need to write a custom parser. Pre-built ones tend to follow the HTML rules.

Upvotes: 1

Related Questions