Reputation: 2223
I'm working with some strings that behave in a strange manner.
Some whitespaces are not recognized by /s in a regex in ruby.
"175 75 16C 101/99 R".gsub( /\s/ , 'x' )
=> "175 x75 x16C x101/99 xR"
The expected result shoul be that every whitespace gets converted to 'x'
I tried to force encoding to UTF-8 to the string but it's not working either. I need a regex that matches every kind of whitespace in my string and converts them to regular whitespaces.
EDIT:
str.encode('utf-8').chars.each { |c| puts c.ord }
49
55
53
160
32
55
53
160
160
32
49
54
67
160
32
49
48
49
47
57
57
160
160
160
32
82
Upvotes: 2
Views: 119
Reputation: 611
From what I understand from the question is you want to convert all whitespace to x. Your current regex search only for ASCII /[ \t\r\n\f]/
according to Ruby Docs. To support Unicode you'll need to use the special Unicode identifier for whitespace characters [[:space:]]
.
Unicode Regex
"175 75 16C 101/99 R".gsub(/[[:space:]]/ , 'x' )
"175xx75xxx16Cxx101/99xxxxR"
Upvotes: 3
Reputation: 51400
According to the Ruby docs, \s
is shorthand for [ \t\r\n\f]
(only a couple ASCII whitespace characters).
If your pattern includes other whitespace characters, such as non-breaking spaces, you can replace \s
with \p{Z}
, which will look for the Unicode Separator character property and thus will match all Unicode-defined whitespace characters.
Upvotes: 5