frosty
frosty

Reputation: 2669

Using preg_replace not working properly

I need to replace everything in a string that is not a word,space,comma,period,question mark,exclamation mark,asterisk or '. I'm trying to do it using preg_replace, but not getting the correct results:

$string = "i don't know if i can do this,.?!*!@#$%^&()_+123|";
preg_replace("~(?![\w\s]+|[\,\.\?\!\*]+|'|)~", "", $string);

echo $string;

Result:

i don't know if i can do this,.?!!*@#$%^&()_+123|

Need Result:

i don't know if i can do this,.?!*

Upvotes: 3

Views: 834

Answers (1)

Matt Raines
Matt Raines

Reputation: 4218

I don't know if you're happy to call html_entity_decode first to convert that ' into an apostrophe. If you are, then probably the simplest way to achieve this is

// Convert HTML entities to characters
$string = html_entity_decode($string, ENT_QUOTES);
// Remove characters other than the specified list.
$string = preg_replace("~[^\w\s,.?!*']+~", "", $string);
// Convert characters back to HTML entities. This will convert the ' back to '
$string = htmlspecialchars($string, ENT_QUOTES);

If not, then you'll need to use some negative assertions to remove & when not followed by #, ; when not preceded by &#039, and so on.

$string = preg_replace("~[^\w\s,.?!*'&#;]+|&(?!#)|&#(?!039;)|(?<!&)#|(?<!&#039);~", "", $string);

The results are subtly different. The first block of code, when provided &quot;, will convert it to " and then remove it from the string. The second block will remove & and ; and leave quot behind in the result.

Upvotes: 1

Related Questions