Reputation: 666
My html code is as follows
<span class="phone">
i want this text
<span class="ignore-this-one">01234567890</span>
<span class="ignore-this-two" >01234567890</span>
<a class="also-ignore-me">some text</a>
</span>
What I want to do is extract the 'i want this text' leaving all of the other elements behind. I've tried several iterations of the following, but none return the text I need:
$name = trim($page->find('span[class!=ignore^] a[class!=also^] span[class=phone]',0)->innertext);
Some guidance would be appreciated as the simple_html_dom section on filters is quite bare.
Upvotes: 0
Views: 576
Reputation: 56
what about using php preg_match (http://php.net/manual/en/function.preg-match.php)
try the below:
<?php
$html = <<<EOF
<span class="phone">
i want this text
<span class="ignore-this-one">01234567890</span>
<span class="ignore-this-two" >01234567890</span>
<a class="also-ignore-me">some text</a>
</span>;
EOF;
$result = preg_match('#class="phone".*\n(.*)#', $html, $matches);
echo $matches[1];
?>
regex explained: find text class="phone" then proceed until the end of the line, matching any character using *.. Then switch to a new line with \n and grab everything on that line by enclosing *. into brackets.
The returned result is stored in the array $matches. $matches[0] holds the value that is returned from the whole regex, while $matches[1] holds the value that is return by the closing brackets.
Upvotes: 1