Mayank Kumar
Mayank Kumar

Reputation: 69

Extract Text from within tags using RegExp PHP

I am trying to extract some strings from the source code of a web page which looks like this :

<p class="someclass">
String1<br />
String2<br />
String3<br />
</p>

I'm pretty sure those strings are the only things that end with a single line break(
). Everything else ends with two or more line breaks. I tried using this :

preg_match_all('~(.*?)<br />{1}~', $source, $matches);

But it doesn't work like it's supposed to. It returns some other text too along with those strings.

Upvotes: 1

Views: 216

Answers (4)

artnikpro
artnikpro

Reputation: 5869

Should work. Please try it

preg_match_all("/([^<>]*?)<br\s*\/?>/", $source, $matches);

or if your strings may contain some HTML code, use this one:

preg_match_all("/(.*?)<br\s*\/?>\\n/", $source, $matches);

Upvotes: -1

Ja͢ck
Ja͢ck

Reputation: 173552

DOMDocument and XPath to the rescue.

$html = <<<EOM
<p class="someclass">
String1<br />
String2<br />
String3<br />
</p>
EOM;

$doc = new DOMDocument;
$doc->loadHTML($html);
$xp = new DOMXPath($doc);

foreach ($xp->query('//p[contains(concat(" ", @class, " "), " someclass ")]') as $node) {
    echo $node->textContent;
}

Demo

Upvotes: 3

Joe
Joe

Reputation: 15528

I wouldn't recommend using a regular expression to get the values. Instead, use PHP's built in HTML parser like this:

$dom = new DOMDocument();
$dom->loadHTML($source);
$xpath = new DOMXPath($dom);

$elements = $xpath->query('//p[@class="someclass"]');
$text = array(); // to hold the strings
if (!is_null($elements)) {
    foreach ($elements as $element) {
        $text[] = strip_tags($element->nodeValue);
    }
}
print_r($text); // print out all the strings

This is tested and working. You can read more about the PHP's DOMDocument class here: http://www.php.net/manual/en/book.dom.php

Here's a demonstration: http://phpfiddle.org/lite/code/0nv-hd6 (click 'Run')

Upvotes: 2

tomsv
tomsv

Reputation: 7277

Try this:

preg_match_all('~^(.*?)<br />$~m', $source, $matches);

Upvotes: -1

Related Questions