Reputation: 7249
I have an HTML document saved in my database as follow:
\\u003cp style=\\\"text-align: center; opacity: 1;\\\"\\u003e\\u003cstrong\\u003e\\u003cspan style=\\\"font-size: 18pt;\\\
I know, it is ugly and I know, it is not the desired way but this is a legacy system.
My task is to get all these HTMLs and convert them to a document in Google Docs. Actually, Google Docs can parse HTML to their internal format pretty good but the HTML needs to be a valid HTML, with <p>
instead of \\u003cp
.
I'm trying to convert/decode/parse/whatever this string to a valid HTML but so far, without any luck.
htmlentities gem, CGI decode, Nokogiri::HTML.parse
, JSON.parse
and none of them did the job.
I also tried string.encode(xxxx)
but also without luck. I was really hoping that .encode
method would do it but I couldn't make it work, maybe I'm using the wrong encoding? (I tried use all of ISO-xxx
encodings)
Upvotes: 0
Views: 711
Reputation: 28305
Here's a quick workaround for you:
input_string.gsub(/\\u(\h{4})/) { [$1.to_i(16)].pack('U') }
With the example input you gave above, this results in:
"<p style=\\\"text-align: center; opacity: 1;\\\"><strong><span style=\\\"font-size: 18pt;\\"
Explanation:
\u003c == <
. The left hand side is an escaped unicode character; this is not the same thing as \\u003c
, which is a literal backslash followed by u003c
.
The regular expression \\u(\h{4})
will match any occurrences of this (\h
stands for "hexadecimal" and is equivalent to [0-9a-fA-F]
), and Array#pack
converts the binary sequence into (in this case) a UTF-8 character.
Ideally of course, you'd solve the problem at its root rather than retro-fit a workaround like this. But if that's outside of your control, then a workaround will have to suffice.
Upvotes: 1
Reputation: 15258
Using Array#pack
:
string = "\\u003cp style=\\\"text-align: center; opacity: 1;\\\"\\u003e\\u003cstrong\\u003e\\u003cspan style=\\\"font-size: 18pt;\\"
string.gsub(/\\u(....)/) { [$1.hex].pack("U") }
# => "<p style=\\\"text-align: center; opacity: 1;\\\"><strong><span style=\\\"font-size: 18pt;\\"
Upvotes: 1