haxpanel
haxpanel

Reputation: 4678

Whats the most efficient/nicest way to extract a text value from a HTML tag using Symfony DOM Crawler?

Given the following HTML code snippet:

<div class="item">
  large
  <span class="some-class">size</span>
</div>

I'm looking for the best way to extract the string "large" using Symfony's Crawler.

$crawler = new Crawler($html);

Here I could use $crawler->html() then apply a regex search. Is there a better solution? Or how would you do it exactly?

Upvotes: 2

Views: 1142

Answers (3)

haxpanel
haxpanel

Reputation: 4678

I've just found a solution that looks the cleanest to me:

$crawler = new Crawler($html);
$result = $crawler->filterXPath('//text()')->text();

Upvotes: 4

COil
COil

Reputation: 7596

$crawler = new Crawler($html);
$node = $crawler->filterXPath('//div[@class="item"]');
$domElement = $node->getNode(0);
foreach ($node->children() as $child) {
    $domElement->removeChild($child);
}
dump($node->text()); die();

After you have to trim whitespace.

Upvotes: 0

user4545769
user4545769

Reputation:

This is a bit tricky as the text that you're trying to get is a text node that the DOMCrawler component doesn't (as far as I know) allow you to extract. Thankfully DOMCrawler is just a layer over the top of PHP's DOM classes which means you could probably do something like:

$crawler = new Crawler($html);
$crawler = $crawler->filterXPath('//div[@class="item"]');
$domNode = $crawler->getNode(0);
$text = null;

foreach ($domNode->children as $domChild) {
    if ($domChild instanceof \DOMText) {
        $text = $domChild->wholeText;
        break;
    }
}

This wouldn't help with HTML like:

<div>
    text
    <span>hello</span>
    other text
</div>

So you would only get "text", not "text other text" in this instance. Take a look at the DOMText documentation for more details.

Upvotes: 0

Related Questions