A.Jesin
A.Jesin

Reputation: 433

Regex for removing special characters on a multilingual string

The most common regex suggested for removing special characters seems to be this -

preg_replace( '/[^a-zA-Z0-9]/', '', $string );

The problem is that it also removes non-English characters.

Is there a regex that removes special characters on all languages? Or the only solution is to explicitly match each special character and remove them?

Upvotes: 6

Views: 1691

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

You can use instead:

preg_replace('/\P{Xan}+/u', '', $string );

\p{Xan} is all that is a number or a letter in any alphabet of the unicode table.
\P{Xan} is all that is not a number or a letter. It is a shortcut for [^\p{Xan}]

Upvotes: 6

anubhava
anubhava

Reputation: 785146

You can use:

$string = preg_replace( '/[^\p{L}\p{N}]+/u', '', $string );

Upvotes: 3

Related Questions