Reputation: 3701
I am about to pull my hairs on this issue. If some one has any solution. I have an html string
$html = '<div id="main">What is going on </div><div>یہاں
تو کوئی ہ</div>
<span>Some More Text <good></span>;
This is the mixed html string having html entities + english characters + numeric symbols of unicode characters. I want to convert only the numeric symbols of unicode characters to actual unicode character values. There is also user formatting that I do not want to lose.
I want the following output
$html = '<div id="main">What is going on </div><div>‘۔سلطان محمود نے گاڑی روکتے ہوئے</div>
<span>Some More Text <good></span>;
I have used the
html_entity_decode($html, ENT_COMPAT, 'utf-8');
but this also converts the <
to <
and >
to >
that I do not want.
Any Other solution??
Note: I am not asking that unicode characters are not being shown correctly on my webpage, they are shown well. because the webpage renders the numeric symbols and shows as real unicode characters. But I want the actaul unicode characters at the back of the webpage too.
Upvotes: 1
Views: 108
Reputation: 3486
Try using preg_preplace_callback with html_entity_decode as callback.
$decode_single_entity = function ($matches) {
return html_entity_decode($matches[0], ENT_COMPAT, 'utf-8');
};
$string = preg_replace_callback('/&#\d+;/', $decode_single_entity, $html);
Upvotes: 1