Reputation: 3011
I've been trying to figure out how to combine two pieces of extracted text into a single result (array). In this case, the title and subtitle of a variety of books.
<td class="item_info">
<span class="item_title">Carrots Like Peas</span>
<em class="item_subtitle">- And Other Fun Facts</em>
</td>
The closest I've been able to get is:
$holds = $crawler->filter('span.item_title,em.item_subtitle');
Which I've managed to output with the following:
$holds->each(function ($node) {
echo '<pre>';
print $node->text();
echo '</pre>';
});
And results in
<pre>Carrots Like Peas</pre>
<pre>- And Other Fun Facts</pre>
Another problem is that not all the books have subtitles, so I need to avoid combining two titles together. How would I go about combining those two into a single result (or array)?
Upvotes: 0
Views: 681
Reputation: 3011
In my case, I took a roundabout way to get where I wanted to be. I stepped back one level in the DOM to the td
tag and grabbed everything and dumped it into the array.
I realized that DomCrawler's documentation had the example code to place the text nodes into an array.
$items_out = $crawler->filter('td.item_info')->each(function (Crawler $node, $i) {
return $node->text();
});
I'd tried to avoid capturing the td
because author's were also included in those cells. After even more digging, I was able to strip the authors from the array with the following:
foreach ($items_out as &$items) {
$items = substr($items,0, strpos($items,' - by'));
}
Just took me five days to get it all sorted out. Now onto the next problem!
Upvotes: 1
Reputation: 2047
As per Goutte Documentation, Goutte utilizes the Symfony DomCrawler component. Information on adding content to a DomCrawler object can be found atSymfony DomCrawler - Adding Content
Upvotes: 0