Reputation: 2160
I'm using a service that I end up with a generated string. Strings are usually like:
Hello Mr John Doe, you are now registered \t.
Hello &nbsb; Mr John Doe, your phone number is &nbsb; 555-555-555 &nbs; \n
I need to remove all html entities and all \t and \n and etc.
I can use html_entity_decode
, to remove none breaking spaces, and use str_replace
for removing \t
or \n
, but is there a more general way? some thing that makes you sure nothing but alphabet characters exist in the string (some string that doesn't contain codes).
Upvotes: 3
Views: 296
Reputation: 146460
If I understood your case correctly, you basically want to convert from HTML to plain text.
Depending on the complexity of your input and the robustness and accuracy needed, you have a couple of options:
Use strip_tags() to remove HTML tags, mb_convert_encoding() with HTML-ENTITIES
as source encoding to decode entities and either strtr() or preg_replace() to make any additional replacement:
$html = "<p>Hello Mr John Doe, you are now registered.
Hello Mr John Doe, your phone number is 555-555-555
Test: €/é</p>";
$plain_text = $html;
$plain_text = strip_tags($plain_text);
$plain_text = mb_convert_encoding($plain_text, 'UTF-8', 'HTML-ENTITIES');
$plain_text = strtr($plain_text, [
"\t" => ' ',
"\r" => ' ',
"\n" => ' ',
]);
$plain_text = preg_replace('/\s+/u', ' ', $plain_text);
var_dump($html, $plain_text);
Use a proper DOM parser, plus maybe preg_replace()
for further tweaking:
$html = "<p>Hello Mr John Doe, you are now registered.
Hello Mr John Doe, your phone number is 555-555-555
Test: €/é</p>";
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors(false);
$xpath = new DOMXPath($dom);
$plain_text = '';
foreach ($xpath->query('//text()') as $textNode) {
$plain_text .= $textNode->nodeValue;
}
$plain_text = preg_replace('/\s+/u', ' ', $plain_text);
var_dump($html, $plain_text);
Both solutions should print something like this:
string(169) "<p>Hello Mr John Doe, you are now registered.
Hello Mr John Doe, your phone number is 555-555-555
Test: €/é</p>"
string(107) "Hello Mr John Doe, you are now registered. Hello Mr John Doe, your phone number is 555-555-555 Test: €/é"
Upvotes: 2