Moon soon
Moon soon

Reputation: 2856

How to match non-Unicode string with regexp in Ruby?

I wanna to match string that contain \xa0, Like:

"\xa0" =~ /\xa0/

But error will throw with:

SyntaxError: (eval):2: invalid multibyte escape: /\xa0/

I am try to use Unicode to match:

"\xa0" =~ /\u00a0/

error will throw too:

ArgumentError: invalid byte sequence in UTF-8

So, how to match \xa0 in ruby

Upvotes: 1

Views: 281

Answers (1)

Stefan
Stefan

Reputation: 114248

Not every byte sequence is a valid Unicode string. (or more specifically UTF-8)

Your single-byte string for example is not:

str = "\xa0"

str.encoding        #=> #<Encoding:UTF-8>
str.valid_encoding? #=> false
str.codepoints      #   ArgumentError (invalid byte sequence in UTF-8)

To work with an arbitrary string, you have set its encoding to binary / ASCII:

str = "\xa0".b      # <-- note the .b

str.encoding        #=> #<Encoding:ASCII-8BIT>
str.valid_encoding? #=> true
str.codepoints      #=> [160]

and also set the regexp encoding to ASCII: (via the n modifier)

str =~ /\xa0/n
#=> 0

Upvotes: 3

Related Questions