dsg
dsg

Reputation: 13004

Ruby HTML unicode to actual characters

I am trying to convert HTML numeric character references to a string. Example:

イス シート 椅子

To the symbols they represent (sorry if this doesn't render properly for you): イス シート 椅子

I've tried the following: CGI::unescapeHTML(str) but I still see the numeric character codes rather than the symbols.

I've tried writing the output to a file (just in case it's simply not rendering properly in the terminal) and opening it with TextEdit/vim but that hasn't helped.

Upvotes: 2

Views: 908

Answers (1)

mu is too short
mu is too short

Reputation: 434615

You could use the htmlentities gem. There is also the hex notation to consider (e.g. イ is the same as イ or "イ"). There's no good reason to do this by hand (and probably miss various edge cases and notations that you might not be aware of) when there is a complete and tested library that will do it for you.

Upvotes: 5

Related Questions