Reputation: 2669
I need to replace everything in a string that is not a word,space,comma,period,question mark,exclamation mark,asterisk or '
. I'm trying to do it using preg_replace, but not getting the correct results:
$string = "i don't know if i can do this,.?!*!@#$%^&()_+123|";
preg_replace("~(?![\w\s]+|[\,\.\?\!\*]+|'|)~", "", $string);
echo $string;
Result:
i don't know if i can do this,.?!!*@#$%^&()_+123|
Need Result:
i don't know if i can do this,.?!*
Upvotes: 3
Views: 834
Reputation: 4218
I don't know if you're happy to call html_entity_decode
first to convert that '
into an apostrophe. If you are, then probably the simplest way to achieve this is
// Convert HTML entities to characters
$string = html_entity_decode($string, ENT_QUOTES);
// Remove characters other than the specified list.
$string = preg_replace("~[^\w\s,.?!*']+~", "", $string);
// Convert characters back to HTML entities. This will convert the ' back to '
$string = htmlspecialchars($string, ENT_QUOTES);
If not, then you'll need to use some negative assertions to remove &
when not followed by #
, ;
when not preceded by '
, and so on.
$string = preg_replace("~[^\w\s,.?!*'&#;]+|&(?!#)|&#(?!039;)|(?<!&)#|(?<!');~", "", $string);
The results are subtly different. The first block of code, when provided "
, will convert it to "
and then remove it from the string. The second block will remove &
and ;
and leave quot
behind in the result.
Upvotes: 1