Reputation: 730
I am trying to write a method to remove some blacklisted characters like bom
characters using their UTF-8
values. I am successful to achieve this by creating a method
in String
class with the following logic,
def remove_blacklist_utf_chars
self.force_encoding("UTF-8").gsub!(config[:blacklist_utf_chars][:zero_width_space].force_encoding("UTF-8"), "")
self
end
Now to make it useful across the applications and reusable I create a config in a yml file
. The yml
structure is something like,
:blacklist_utf_chars:
:zero_width_space: '"\u{200b}"'
(Edit) Also as suggested by Drenmi this didn't work,
:blacklist_utf_chars:
:zero_width_space: \u{200b}
The problem I am facing is that the method remove_blacklist_utf_chars
does not work when I load the utf-encoding of blacklist characters from yml file
But when I directly pass these in the method and not via the yml file
the method works.
So basically
self.force_encoding("UTF-8").gsub!("\u{200b}".force_encoding("UTF-8"), "")
-- works.
but,
self.force_encoding("UTF-8").gsub!(config[:blacklist_utf_chars][:zero_width_space].force_encoding("UTF-8"), "")
-- doesn't work.
I printed the value of config[:blacklist_utf_chars][:zero_width_space]
and its equal to "\u{200b}"
I got this idea by referring: https://stackoverflow.com/a/5011768/2362505.
Now I am not sure how what exactly is happening when the blacklist chars list is loaded via yml in ruby code.
EDIT 2:
On further investigation I observed that there is an extra \
getting added while reading the hash from the yaml.
So,
puts config[:blacklist_utf_chars][:zero_width_space].dump
prints:
"\\u{200b}"
But then if I just define the yaml as:
:blacklist_utf_chars:
:zero_width_space: 200b
and do,
ch = "\u{#{config[:blacklist_utf_chars][:zero_width_space]}}"
self.force_encoding("UTF-8").gsub!(ch.force_encoding("UTF-8"), "")
I get
/Users/harshsingh/dir/to/code/utils.rb:121: invalid Unicode escape (SyntaxError)
Upvotes: 0
Views: 591
Reputation: 79783
The "\u{200b}"
syntax is used for escaping Unicode characters in Ruby source code. It won’t work inside Yaml.
The equivalent syntax for a Yaml document is the similar "\u200b"
(which also happens to be valid in Ruby). Note the lack of braces ({}
), and also the double quotes are required, otherwise it will be parsed as literal \u200b
.
So your Yaml file should look like this:
:blacklist_utf_chars:
:zero_width_space: "\u200b"
Upvotes: 2
Reputation: 8777
If you puts
the value, and get the output "\u{200b}"
, it means the quotes are included in your string. I.e., you're actually calling:
self.force_encoding("UTF-8").gsub!('"\u{200b}"'.config[:blacklist_utf_chars][:zero_width_space].force_encoding("UTF-8"), "")
Try changing your YAML file to:
:blacklist_utf_chars:
:zero_width_space: \u{200b}
Upvotes: 1