Alexey Zakharov
Alexey Zakharov

Reputation: 25102

Regex "\w" doesn't process utf-8 characters in Ruby 1.9.2

Regex \w doesn't match utf-8 characters in Ruby 1.9.2. Anybody faced same problem?

Example:

/[\w\s]+/u

In my rails application.rb I've added config.encoding = "utf-8"

Upvotes: 2

Views: 1600

Answers (2)

Gonzalo S
Gonzalo S

Reputation: 894

You could always use something like

[a-zA-Z0-9_ñáéíóú] 

instead of \w

Upvotes: 0

hobbs
hobbs

Reputation: 239801

Define "doesn't match utf-8 characters"? If you expect \w to match anything other than exactly the uppercase and lowercase ASCII letters, the ASCII digits, and underscore, it won't -- Ruby has defined \w to be equivalent to [A-Za-z0-9_] regardless of Unicode. Maybe you want \p{Word} or something similar instead.

Ref: Ruby 1.9 Regexp documentation (see section "Character Classes").

Upvotes: 9

Related Questions