Reputation: 233
I want to match Chinese word in a string, but it failed
irb(main):016:0> "身高455478".scan(/\p{Han}/)
SyntaxError: (irb):16: invalid character property name {Han}: /\p{Han}/
from C:/Program Files/Ruby-2.1.0/bin/irb.bat:18:in `<main>'
What's wrong with it?
The problem is very strange, is it the character encoding problem?
Upvotes: 3
Views: 1898
Reputation: 122453
I can reproduce the problem in irb. The difference between my Ruby environment and others who can't reproduce the problem is, my encoding in irb is by default GBK
which is for Chinese.
This can reproduce the problem:
#encoding:GBK
p "身高455478".scan(/\p{Han}/)
shows error: invalid character property name {Han}: /\p{Han}/
To fix the problem, use the UTF-8 encoding:
#encoding:utf-8
p "身高455478".scan(/\p{Han}/)
Outputs: ["\u8EAB", "\u9AD8"]
As @Stefan suggests, to set irb to use UTF-8 encoding, start irb using irb -E UTF-8
.
To encode this one string, use String#encode:
'身高455478'.encode('utf-8').scan(/\p{Han}/u)
#=> ["\u8EAB", "\u9AD8"]
Upvotes: 5