Reputation: 7932
I want to read a webpage A in ISO-8859-1 charset, according to the browser, and return the content in UTF-8 as a content of the webpage B.
This is: I want to show the content of the page A in the same charset that I use to show the rest of the page B, that is UTF-8.
How do I do this in java/groovy?
thanks in advance
Upvotes: 2
Views: 6418
Reputation: 8078
In Groovy you could write something like this:
def source = new URL("http://www.google.com").getText("ISO-8859-1")
def target = new String(source.getBytes("UTF-8"), "UTF-8")
Upvotes: 3
Reputation: 108859
You don't say what stack you're building on or how you're accessing the content, but the general mechanism for such a transcoding operation is to use UTF-16 as an intermediary; that is, convert ISO-8859-1 bytes to UTF-16 chars to UTF-8 bytes.
You could use InputStreamReader
(with the an ISO-8859-1 Charset
), then write bytes via OutputStreamWriter
(with a UTF-8 Charset
).
Some APIs provide encoding operations as part of their I/O classes (e.g. ServletResponse.getWriter()
).
I'm ignoring any need to parse and transform the data, which is a whole other can of worms.
Upvotes: 1