hashi
hashi

Reputation: 77

Regex to replace punctuation

I've been trying for a few hours to get this to work to the effect I need but nothing works quite like it should. I'm building a discussion board type thing and have made a way to tag other users by putting @username in the post text.

Currently I have this code to strip anything that wouldn't be part of the username once the tags have already been pulled out of the entire text:

$name= preg_replace("/[^A-Za-z0-9_]/",'',$name);

This works well because it correct captures names that are for example (@username), @username:, @username, some text etc. (so to remove the ,, :, and )).

HOWEVER, this does not work when the user has non-ascii characters in their username. For example if it's @üsername, the result of that line above gives sername which is not useful.

IS there a way using preg_replace to still strip these additional punctuation, but retain any non-ascii letters?

Any help is much appreciated :)

Upvotes: 1

Views: 853

Answers (2)

anubhava
anubhava

Reputation: 784958

To detect punctuation characters, you can use unicode property \p{P} instead:

$name = preg_replace('/[\p{P} ]+/', '', $name);

RegEx Demo

Upvotes: 1

Aleksei Matiushkin
Aleksei Matiushkin

Reputation: 121000

You enter the area of Unicode Regexps.

$name= preg_replace('/[^\p{Letter}\p{Number}_]/u', '', $name);

or the other way round. The link I provided contains more examples.

Upvotes: 4

Related Questions