Joe Half Face
Joe Half Face

Reputation: 2333

Regex matches non-english letters as non-word characters

@raw_array[i]=~/[\W]/

Very simple regexp. When I try it with some non-latin letters (russian to be specific) condition is false.

What can I do with this?

Upvotes: 7

Views: 2742

Answers (2)

Marcelo De Polli
Marcelo De Polli

Reputation: 29291

@raw_array[i] =~ /[\p{L}]/

Tested with Cyrillic characters.

Reference: http://www.regular-expressions.info/unicode.html#prop

Upvotes: 9

Darshan Rivka Whittle
Darshan Rivka Whittle

Reputation: 34031

From the Regexp documentation:

/\W/ - A non-word character ([^a-zA-Z0-9_])

It's specifically not Unicode-aware. Perhaps something like this will work better for you:

@raw_array[i]=~/[^[:word:]]/

Upvotes: 2

Related Questions