Reputation: 792
I want to remove all HTML codes like " € á ...
from a string using REGEX.
String: "This is a string " € á &"
Output Required: This is a string
Upvotes: 1
Views: 6518
Reputation: 3696
Try this:
preg_replace('/[^\w\d\s]*/', '', htmlspecialchars_decode($string));
Although it might remove some things you don't want removed. You may need to modify the regex.
Upvotes: 0
Reputation: 17028
preg_replace('#&[^;]+;#', '', "This is a string " € á &");
Upvotes: 0
Reputation: 869
you can try
$str="This is a string " € á &";
$new_str = preg_replace("/&#?[a-z0-9]+;/i",'',$str);
echo $new_str;
i hope this may work
DESC:
& - starting with
# - some HTML entities use the # sign
?[a-z0-9] - followed by
;- ending with a semi-colon
i - case insensitive.
Upvotes: 2
Reputation: 141935
$str = preg_replace_callback('/&[^; ]+;/', function($matches){
return html_entity_decode($matches[0], ENT_QUOTES) == $matches[0] ? $matches[0] : '';
}, $str);
This will work, but won't strip €
since that is not an entity in HTML 4. If you have PHP 5.4 you can use the flags ENT_QUOTES | ENT_HTML5
to have it work correctly with HTML5 entities like €
.
Upvotes: 0
Reputation: 2412
If you're trying to totally remove entities (ie: not decoding them) then try this:
$string = 'This is a string " € á &';
$pattern = '/&([#0-9A-Za-z]+);/';
echo preg_replace($pattern, '', $string);
Upvotes: 0