Thibault
Thibault

Reputation: 598

Read an UTF-8 encoded text file from internet in Java

I want to read an xml file from the internet. You can find it here.
The problem is that it is encoded in UTF-8 and I need to store it into a file in order to parse it later. I have already read a lot of topics about that and here is what I came up with :

BufferedReader in;
String readLine;
try
{
    in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));
    BufferedWriter out = new BufferedWriter(new FileWriter(file));

    while ((readLine = in.readLine()) != null)
        out.write(readLine+"\n");

    out.close();
}

catch (UnsupportedEncodingException e)
{
    e.printStackTrace();
}

catch (IOException e)
{
    e.printStackTrace();
}

This code works until this line : <title>Chérie FM</title>
When I debug, I get this : <title>Ch�rie FM</title>

Obviously, there is something I fail to understand, but it seems to me that I followed the code saw on several website.

Upvotes: 2

Views: 3421

Answers (2)

Maur&#237;cio Linhares
Maur&#237;cio Linhares

Reputation: 40333

This file is not encoded as UTF-8, it's ISO-8859-1.

By changing your code to:

BufferedReader in;
String readLine;
try
{
    in = new BufferedReader(new InputStreamReader(url.openStream(), "ISO-8859-1"));
    BufferedWriter out = new BufferedWriter(new OutputStreamWriter( new FileOutputStream(file) , "UTF-8"));

    while ((readLine = in.readLine()) != null)
        out.write(readLine+"\n");
    out.flush();
    out.close();
}

catch (UnsupportedEncodingException e)
{
    e.printStackTrace();
}

catch (IOException e)
{
    e.printStackTrace();
}

You should have the expected result.

Upvotes: 8

Angelo Fuchs
Angelo Fuchs

Reputation: 9941

If you need to write a file in a given encoding, use FileOutputStream instead.

in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));
FileOutputStream out = new FileOutputStream(file);

while ((readLine = in.readLine()) != null)
    write((readLine+"\n").getBytes("UTF-8"));

out.close();

Upvotes: -1

Related Questions