TopperH
TopperH

Reputation: 2223

Whitespaces not escaped by /\s/

I'm working with some strings that behave in a strange manner.

Some whitespaces are not recognized by /s in a regex in ruby.

"175  75   16C  101/99    R".gsub( /\s/ , 'x' )
 => "175 x75  x16C x101/99   xR"

The expected result shoul be that every whitespace gets converted to 'x'

I tried to force encoding to UTF-8 to the string but it's not working either. I need a regex that matches every kind of whitespace in my string and converts them to regular whitespaces.

EDIT:

str.encode('utf-8').chars.each { |c| puts c.ord }     
49
55
53
160
32
55
53
160
160
32
49
54
67
160
32
49
48
49
47
57
57
160
160
160
32
82

Upvotes: 2

Views: 119

Answers (2)

Benji
Benji

Reputation: 611

From what I understand from the question is you want to convert all whitespace to x. Your current regex search only for ASCII /[ \t\r\n\f]/ according to Ruby Docs. To support Unicode you'll need to use the special Unicode identifier for whitespace characters [[:space:]].

Unicode Regex

"175  75   16C  101/99    R".gsub(/[[:space:]]/ , 'x' )
"175xx75xxx16Cxx101/99xxxxR"

Upvotes: 3

Lucas Trzesniewski
Lucas Trzesniewski

Reputation: 51400

According to the Ruby docs, \s is shorthand for [ \t\r\n\f] (only a couple ASCII whitespace characters).

If your pattern includes other whitespace characters, such as non-breaking spaces, you can replace \s with \p{Z}, which will look for the Unicode Separator character property and thus will match all Unicode-defined whitespace characters.

Upvotes: 5

Related Questions