Reputation: 7053
I am taking a few paragraphs from a database and try to seperate the paragraphs into an array with regex and different classes..but nothing works.
I tried to do this:
public function get_first_para(){
$doc = new DOMDocument();
$doc->loadHTML($this->review);
foreach($doc->getElementsByTagName('p') as $paragraph) {
echo $paragraph."<br/><br/><br/>";
}
}
But I get this:
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : p in Entity, line: 9 in C:\Inetpub\vhosts\bestcamdirectory.com\httpdocs\sandbox\model\ReviewContentExtractor.php on line 18
Catchable fatal error: Object of class DOMElement could not be converted to string in C:\Inetpub\vhosts\bestcamdirectory.com\httpdocs\sandbox\model\ReviewContentExtractor.php on line 20
Why do I get the message, Is there an easy way to extract all the paragraphs from a string?
UPDATE:
public function get_first_para(){
$pattern="/<p>(.+?)<\/p>/i";
preg_match_all($pattern,$this->review,$matches,PREG_PATTERN_ORDER);
return $matches;
}
I would prefer the second way..But it doesnt work well too..
Upvotes: 2
Views: 4693
Reputation: 20753
The DOMDocument::getElementsByTagName returns a DOMNodeList object which is iterable but not an array. In the foreach
the $paragraph
variabl is an istance of DOMElement so simply using it as a string won't work (as the error explains).
What you want is the text content of the DOMElement, which is available trough the textContent property of those (inherited from DOMNode class):
foreach($doc->getElementsByTagName('p') as $paragraph) {
echo $paragraph->textContent."<br/><br/><br/>"; // for text only
}
Or if you need the full content of the DOMNode you can use DOMDocument::saveHTML:
foreach($doc->getElementsByTagName('p') as $paragraph) {
echo $doc->saveHTML($paragraph)."<br/><br/><br/>\n"; // with the <p> tag
// without the <p>
// if you don't need the containing <p> tag, you can iterate trough it's childs and output them
foreach ($paragraph->childNodes as $cnode) {
echo $doc->saveHTML($cnode);
}
}
As for your loadHTML error, the html input is invalid, you can suppress warnings with:
libxml_use_internal_errors(true); // before loading the html content
If you need these errors, see the libxml's error handling part of the manual.
Since you insists on regexps here's how you could go about it:
preg_match_all('!<p>(.+?)</p>!sim',$html,$matches,PREG_PATTERN_ORDER);
The pattern modifiers: m
means multiline, s
means the .
can match line ends, i
for case insensitivity.
Upvotes: 4