Reputation: 27195
I have a string as below
<p> Hello World, this is StackOverflow's question details page</p>
I want to extract text from above HTML as Hello World, this is StackOverflow's question details page
notice that I want to remove the
as well.
How we can achieve this in PHP, I tried few functions, strip_tags, html_entity_decode etc, but all are failing in some conditions.
Please help, Thanks!
Edited my code which I am trying is as below, but its not working :( It leaves the
and '
this type of characters.
$TMP_DESCR = trim(strip_tags($rs['description']));
Upvotes: 0
Views: 546
Reputation: 9299
Below worked for me...had to do a str_replace
on the non-breaking space though.
$string = "<p> Hello World, this is StackOverflow's question details page</p>";
echo htmlspecialchars_decode(trim(strip_tags(str_replace(' ', '', $string))), ENT_QUOTES);
Upvotes: 1
Reputation: 238035
Probably the nicest and most reliable way to do this is with genuine (X|HT)ML parsing functions like the DOMDocument
class:
<?php
$str = "<p> Hello World, this is StackOverflow's question details page</p>";
$dom = new DOMDocument;
$dom->loadXML(str_replace(' ', ' ', $str));
echo trim($dom->firstChild->nodeValue);
// "Hello World, this is StackOverflow's question details pages"
This is probably slight overkill for this problem, but using the proper parsing functionality is a good habit to get into.
Edit: You can reuse the DOMDocument
object, so you only need two lines within the loop:
$dom = new DOMDocument;
while ($rs = mysql_fetch_assoc($result)) { // or whatever
$dom->loadHTML(str_replace(' ', ' ', $rs['description']));
$TMP_DESCR = $dom->firstChild->nodeValue;
// do something with $TMP_DESCR
}
Upvotes: 0
Reputation: 1672
First, you'll have to call trim() on the HTML to remove the white space. http://php.net/manual/en/function.trim.php
Then strip_tags
, then html_entity_decode
.
So: html_entity_decode(strip_tags(trim(html)));
Upvotes: 0
Reputation: 25049
strip_tags()
will get rid of the tags, and trim()
should get rid of the whitespace. I'm not sure if it will work with non-breaking spaces though.
Upvotes: 0