Reactormonk
Reactormonk

Reputation: 21730

How to match unicode words with ruby 1.9?

I'm using ruby 1.9 and trying to find out which regex I need to make this true:

Encoding.default_internal = Encoding.default_external = 'utf-8'
"föö".match(/(\w+)/u)[1] == "föö"
# => false

Upvotes: 10

Views: 10057

Answers (3)

rogerdpack
rogerdpack

Reputation: 66911

http://www.ruby-forum.com/topic/208777

and

http://www.ruby-forum.com/topic/210770

might have clues for you.

You can also use the (documented) \p{L} property, ex:

$ ruby -ve "p '℉üüü' =~ /\p{L}/"
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]
1

Upvotes: 0

J-_-L
J-_-L

Reputation: 9177

You can manually turn on Unicode matching using the inside (?u) syntax:

"föö".match(/(?u)(\w+)/)[1] == "föö"
# => true

However, using Unicode Property Syntax (steenslag's answer) or POSIX Brackets Syntax is better style, since they both automatically respect Unicode codepoints:

"föö".match(/(\p{word}+)/)[1] == "föö"
# => true

"föö".match(/([[:word:]]+)/)[1] == "föö"
# => true

See this blog post for more info about matching Unicode characters in Ruby regexes.

Upvotes: 1

steenslag
steenslag

Reputation: 80085

# encoding=utf-8 
p "föö".match(/\p{Word}+/)[0] == "föö"

Upvotes: 37

Related Questions