KodeFor.Me
KodeFor.Me

Reputation: 13511

Regular expression for UTF-8 words

I am creating a shopping cart and I have a litle issue with regex.

What I would like to do, is to validate the product titles by allowing the end user to use the following characters:

words spaces : . -

my current regex is this

/^[\w \-\.\:]+$/i

but, when I try for example to paste some UTF-8 Characters like Greek alphabet characters or Chinese characters, Russian and so on, this regex fails.

NOTE : I already have try for the Greek characters to use the α-ωΑ-Ω as well the \x{0374}-\x{03FF} with no luck. Also this teqnique does not support other languages alphabets

So, is there a way to match all of these characters in one regex?

Upvotes: 2

Views: 1545

Answers (1)

Joop Eggen
Joop Eggen

Reputation: 109613

Add \p{L}\p{M} for the Posix groups Letters and combining diacritical Marks. Zero-width marks, accents, should not be forgotten because é can be written as one letter, but also as letter-e + combining accent-acute. And some alphabets have more than one accent to a letter.

As commented by @MeriaonosNikos do not forget the Unicode switch at the end of the regex /u.

Upvotes: 1

Related Questions