James
James

Reputation: 666

SIMPLE HTML DOM - how to ignore nested elements?

My html code is as follows

<span class="phone">
i want this text
<span class="ignore-this-one">01234567890</span>
<span class="ignore-this-two" >01234567890</span>
<a class="also-ignore-me">some text</a>
</span>

What I want to do is extract the 'i want this text' leaving all of the other elements behind. I've tried several iterations of the following, but none return the text I need:

$name = trim($page->find('span[class!=ignore^] a[class!=also^] span[class=phone]',0)->innertext);

Some guidance would be appreciated as the simple_html_dom section on filters is quite bare.

Upvotes: 0

Views: 576

Answers (1)

georgec20001
georgec20001

Reputation: 56

what about using php preg_match (http://php.net/manual/en/function.preg-match.php)

try the below:

<?php

$html = <<<EOF
<span class="phone">
i want this text
<span class="ignore-this-one">01234567890</span>
<span class="ignore-this-two" >01234567890</span>
<a class="also-ignore-me">some text</a>
</span>;
EOF;

$result = preg_match('#class="phone".*\n(.*)#', $html, $matches);

echo $matches[1];

?>

regex explained: find text class="phone" then proceed until the end of the line, matching any character using *.. Then switch to a new line with \n and grab everything on that line by enclosing *. into brackets.

The returned result is stored in the array $matches. $matches[0] holds the value that is returned from the whole regex, while $matches[1] holds the value that is return by the closing brackets.

Upvotes: 1

Related Questions