Reputation: 132307
I'm using Ruby (and Nokogiri in case that is helpful) to encode some documents. I want to change actual unicode characters (like “
) to html entities (like “
). How do I do this? I know I can do a single character with something like
s = '“'
puts "&##{.unpack('U').first};" # gives “
but is there a way to do this properly using iconv or nokogiri?
Upvotes: 0
Views: 576
Reputation: 54992
It may not be proper but nokogiri does this (libxml2 I think actually) when it doesn't understand the encoding:
Nokogiri::HTML(html,nil,'klingon')
Upvotes: 1
Reputation: 80075
There is the HTMLEntities gem. For it's decimal encoding it does about the same as your code (unpack).
Upvotes: 1
Reputation: 132307
I've come up with this method, which takes a quite brute-force approach which is surely (hopefully?) replaced by a compiled library solution? It works though:
def clean(text)
# Convert html chars to entities.
text = text.gsub(/[^\u{20}-\u{7E}]/){|char| "&##{char.unpack('U')[0]};"}
end
Upvotes: 0