bluexuemei
bluexuemei

Reputation: 233

How to match Chinese word in Ruby?

I want to match Chinese word in a string, but it failed

irb(main):016:0> "身高455478".scan(/\p{Han}/)
SyntaxError: (irb):16: invalid character property name {Han}: /\p{Han}/
    from C:/Program Files/Ruby-2.1.0/bin/irb.bat:18:in `<main>'

What's wrong with it?

The problem is very strange, is it the character encoding problem?

Upvotes: 3

Views: 1898

Answers (1)

Yu Hao
Yu Hao

Reputation: 122453

I can reproduce the problem in irb. The difference between my Ruby environment and others who can't reproduce the problem is, my encoding in irb is by default GBK which is for Chinese.

This can reproduce the problem:

#encoding:GBK
p "身高455478".scan(/\p{Han}/)

shows error: invalid character property name {Han}: /\p{Han}/

To fix the problem, use the UTF-8 encoding:

#encoding:utf-8
p "身高455478".scan(/\p{Han}/)

Outputs: ["\u8EAB", "\u9AD8"]


As @Stefan suggests, to set irb to use UTF-8 encoding, start irb using irb -E UTF-8.

To encode this one string, use String#encode:

'身高455478'.encode('utf-8').scan(/\p{Han}/u)
#=> ["\u8EAB", "\u9AD8"]

Upvotes: 5

Related Questions