Ram
Ram

Reputation:

Extract text from HTML

Actors: example world

this example word using regular expression in php .....

Upvotes: 0

Views: 244

Answers (2)

Gordon
Gordon

Reputation: 317177

Like Gumbo already pointed out in the comments to this question and like you have also been told in a number of your previous questions as well, Regex aint the right tool for parsing HTML.

The following will use DOM to get the first following sibling of any strong elements with a class attribute of nfpd. In the case of the example HTML, this would be the content of the TextNode, e.g. : example world.

Example HTML:

$html = <<< HTML
<p>
    <strong class="nfpd">Actors</strong>: example world <br />
    something else
</p>
HTML;

And extraction with DOM

libxml_use_internal_errors(TRUE);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xPath = new DOMXPath($dom);
libxml_clear_errors();

$nodes = $xPath->query('//strong[@class="nfpd"]/following-sibling::text()[1]');
foreach($nodes as $node) {
    echo $node->nodeValue; // : example world 
}

You can also do it withouth an XPath, though it gets more verbose then:

$nodes = $dom->getElementsByTagName('strong');
foreach($nodes as $node) {
    if($node->hasAttribute('class') &&
       $node->getAttribute('class') === 'nfpd' &&
       $node->nextSibling) {
        echo $node->nextSibling->nodeValue; // : example world 
    }
}

Removing the colon and whitespace is trivial: Use trim.

Upvotes: 1

user142162
user142162

Reputation:

preg_match('/<strong class="nfpd">Actors<\/strong>:([^<]+)<br \/>/', $text, $matches);

print_r($matches);

Upvotes: 1

Related Questions