user2427
user2427

Reputation: 7932

How to convert from ISO-8859-1 to UTF-8 a webpage in java/groovy

I want to read a webpage A in ISO-8859-1 charset, according to the browser, and return the content in UTF-8 as a content of the webpage B.

This is: I want to show the content of the page A in the same charset that I use to show the rest of the page B, that is UTF-8.

How do I do this in java/groovy?

thanks in advance

Upvotes: 2

Views: 6418

Answers (2)

Christoph Metzendorf
Christoph Metzendorf

Reputation: 8078

In Groovy you could write something like this:

def source = new URL("http://www.google.com").getText("ISO-8859-1")
def target = new String(source.getBytes("UTF-8"), "UTF-8")

Upvotes: 3

McDowell
McDowell

Reputation: 108859

You don't say what stack you're building on or how you're accessing the content, but the general mechanism for such a transcoding operation is to use UTF-16 as an intermediary; that is, convert ISO-8859-1 bytes to UTF-16 chars to UTF-8 bytes.

You could use InputStreamReader (with the an ISO-8859-1 Charset), then write bytes via OutputStreamWriter (with a UTF-8 Charset).

Some APIs provide encoding operations as part of their I/O classes (e.g. ServletResponse.getWriter()).

I'm ignoring any need to parse and transform the data, which is a whole other can of worms.

Upvotes: 1

Related Questions