esod
esod

Reputation: 345

Parse the h2 and the next tag in PHP

I need to create an array from the following string.

$body = '<h2>Heading one</h2>
         <p>Lorem ipsum dolor</p>

         <h2>Heading two</h2>
         <ul>
           <li>list item one.</li>
           <li>List item two.</li>
         </ul>

         <h2>Heading three</h2>
         <table class="table">
           <tbody>
             <tr>
               <td>Table data one</td>
               <td>Description of table data one</td>
             </tr>
             <tr>
               <td>Table data two</td>
               <td>Description of table data two</td>
             </tr>
           </tbody>
         </table>';

I can use the h2 tag as the first index to get the 'question' value.

$dom = new \DOMDocument();
$dom->loadHTML($body);
$xPath = new \DOMXpath($dom);

$question_answer = [];
$tags = $dom->getElementsByTagName('h2');
foreach ($tags as $tag) {
  $next_element = $xPath->query('./following-sibling::p', $tag);
  $question_answer[] = [
    'question' => $tag->nodeValue,
    'answer' =>  $next_element->item(0)->nodeValue,
  ];
}

echo '<pre>';
print_r($question_answer);
echo '</pre>';

Incorporating @Kevin's suggestion which work great for the p tag and produces the following output:

Array
(
    [0] => Array
        (
            [question] => Heading one
            [answer] => Lorem ipsum dolor
        )

    [1] => Array
        (
            [question] => Heading two
            [answer] => 
        )

    [2] => Array
        (
            [question] => Heading three
            [answer] => 
        )

)

Now I just have to solve answer for when the next tag is an unordered list or a table. For the tables, I'm only interested in the td tags.

Upvotes: 1

Views: 429

Answers (2)

esod
esod

Reputation: 345

We're excluding the table markup for now because it's probably not relevant in this use case. Here's the content:

$body = '<h2>Heading one</h2>
       <p>Lorem ipsum dolor</p>

       <h2>Heading two</h2>
       <ul>
         <li>List item one.</li>
         <li>List item two.</li>
       </ul>';

Here is the solution code:

$dom = new \DOMDocument();
$dom->loadHTML($body);
$xPath = new \DOMXpath($dom);

$question_answer = [];
$tags = $dom->getElementsByTagName('h2');
foreach ($tags as $tag) {
  $possible_answer = $xPath->query('./following-sibling::p | ./following-sibling::ul', $tag);

  if ($possible_answer->length <= 0) {
    continue;
  }

  if ($possible_answer->item(0)->tagName === 'p') {
    $answer = $possible_answer->item(0)->nodeValue;
    $question_answer[] = [
      'question' => $tag->nodeValue,
      'answer' => $answer,
    ];
  }

  elseif ($possible_answer->item(0)->tagName === 'ul') {
    $li_dom = [];
    foreach ($possible_answer->item(0)->getElementsByTagName('li') as $li) {
      $li_dom[] = $li->nodeValue;
    }
    $li_dom = implode(" ", $li_dom);

      $question_answer[] = [
        'question' => $tag->nodeValue,
        'answer' => $li_dom,
      ];
    }
  }

echo '<pre>';
print_r($question_answer);
echo '</pre>';

Here is the output:

Array
(
    [0] => Array
        (
            [question] => Heading one
            [answer] => Lorem ipsum dolor
        )

    [1] => Array
        (
            [question] => Heading two
            [answer] => List item one. List item two.
        )

)

Upvotes: 0

Kevin
Kevin

Reputation: 41875

Since you're iterating on each h2 tag, use following-sibling::p relative to the current tag.

foreach ($tags as $tag) {
    $next_element = $xPath->query('./following-sibling::p', $tag);
    if ($next_element->length <= 0) continue; //skip it if p not found
    $question_answer[] = [
        'question' => $tag->nodeValue,
        'answer' => $next_element->item(0)->nodeValue,
    ];
}

Upvotes: 2

Related Questions