Reputation: 143
I wanted to change file's encoding form ones to the other(doesn't matter which). But when i open the file with the result(file w.txt) it is messed up inside. Windows does not understand it correct.
What result encoding should i put (args[1]) so it will be interpreted by windows notepad correct?
import java.io.*;
import java.nio.charset.Charset;
public class Kodowanie {
public static void main(String[] args) throws IOException {
args = new String[2];
args[0] = "plik.txt";
args[1] = "ISO8859_2";
String linia, s = "";
File f = new File(args[0]), f1 = new File("w.txt");
FileInputStream fis = new FileInputStream(f);
InputStreamReader isr = new InputStreamReader(fis,
Charset.forName("UTF-8"));
BufferedReader in = new BufferedReader(isr);
FileOutputStream fos = new FileOutputStream(f1);
OutputStreamWriter osw = new OutputStreamWriter(fos,
Charset.forName(args[1]));
BufferedWriter out = new BufferedWriter(osw);
while ((linia = in.readLine()) != null) {
out.write(linia);
out.newLine();
}
out.close();
in.close();
}
}
input:
Ala
ma
Kota
output:
?Ala
ma
Kota
Why there is a '?'
Upvotes: 1
Views: 197
Reputation: 78579
US-ASCII is a subset of unicode (a pretty small one by the way). You are reading a file in UTF-8 and then you write it back in US-ASCII. Thus your the encoder will have to take a desicion when a given UTF character cannot be expressed in terms of the reduced 7-bit US-ASCII subset. Clasically, this is repaced by a default charcter, like ?.
Take into account that characters in UTF-8 are multibyte in many cases, whereas US-ASCII is only 7-bit long. This means that al unicode characters above byte 127 cannot be expressed in US-ASCII. That could explain the question marks that you see once the file has been converted.
I had answered a similar question Reading Strange Unicode Characters in Java. Perhaps it helps.
I also recommend you to read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
Upvotes: 1