Reputation: 345
I need to create an array from the following string.
$body = '<h2>Heading one</h2>
<p>Lorem ipsum dolor</p>
<h2>Heading two</h2>
<ul>
<li>list item one.</li>
<li>List item two.</li>
</ul>
<h2>Heading three</h2>
<table class="table">
<tbody>
<tr>
<td>Table data one</td>
<td>Description of table data one</td>
</tr>
<tr>
<td>Table data two</td>
<td>Description of table data two</td>
</tr>
</tbody>
</table>';
I can use the h2
tag as the first index to get the 'question'
value.
$dom = new \DOMDocument();
$dom->loadHTML($body);
$xPath = new \DOMXpath($dom);
$question_answer = [];
$tags = $dom->getElementsByTagName('h2');
foreach ($tags as $tag) {
$next_element = $xPath->query('./following-sibling::p', $tag);
$question_answer[] = [
'question' => $tag->nodeValue,
'answer' => $next_element->item(0)->nodeValue,
];
}
echo '<pre>';
print_r($question_answer);
echo '</pre>';
Incorporating @Kevin's suggestion which work great for the p tag and produces the following output:
Array
(
[0] => Array
(
[question] => Heading one
[answer] => Lorem ipsum dolor
)
[1] => Array
(
[question] => Heading two
[answer] =>
)
[2] => Array
(
[question] => Heading three
[answer] =>
)
)
Now I just have to solve answer
for when the next tag is an unordered list or a table. For the tables, I'm only interested in the td tags.
Upvotes: 1
Views: 429
Reputation: 345
We're excluding the table markup for now because it's probably not relevant in this use case. Here's the content:
$body = '<h2>Heading one</h2>
<p>Lorem ipsum dolor</p>
<h2>Heading two</h2>
<ul>
<li>List item one.</li>
<li>List item two.</li>
</ul>';
Here is the solution code:
$dom = new \DOMDocument();
$dom->loadHTML($body);
$xPath = new \DOMXpath($dom);
$question_answer = [];
$tags = $dom->getElementsByTagName('h2');
foreach ($tags as $tag) {
$possible_answer = $xPath->query('./following-sibling::p | ./following-sibling::ul', $tag);
if ($possible_answer->length <= 0) {
continue;
}
if ($possible_answer->item(0)->tagName === 'p') {
$answer = $possible_answer->item(0)->nodeValue;
$question_answer[] = [
'question' => $tag->nodeValue,
'answer' => $answer,
];
}
elseif ($possible_answer->item(0)->tagName === 'ul') {
$li_dom = [];
foreach ($possible_answer->item(0)->getElementsByTagName('li') as $li) {
$li_dom[] = $li->nodeValue;
}
$li_dom = implode(" ", $li_dom);
$question_answer[] = [
'question' => $tag->nodeValue,
'answer' => $li_dom,
];
}
}
echo '<pre>';
print_r($question_answer);
echo '</pre>';
Here is the output:
Array ( [0] => Array ( [question] => Heading one [answer] => Lorem ipsum dolor ) [1] => Array ( [question] => Heading two [answer] => List item one. List item two. ) )
Upvotes: 0
Reputation: 41875
Since you're iterating on each h2
tag, use following-sibling::p
relative to the current tag.
foreach ($tags as $tag) {
$next_element = $xPath->query('./following-sibling::p', $tag);
if ($next_element->length <= 0) continue; //skip it if p not found
$question_answer[] = [
'question' => $tag->nodeValue,
'answer' => $next_element->item(0)->nodeValue,
];
}
Upvotes: 2