Reputation: 187
I am trying to write Arabic word in windows Notepad by buffered output stream in java and after writing the charset encoding for notepad become UTF-8 so it is obvious the default charset for writing file in java is UTF-8 but the wonder when I read it by buffered input stream , it is not read by UTF-8 encoding because when reading it the result is strange symbols
enter code here
class writeFile extends BufferedOutputStream {
public writeFile(OutpuStream out){
super(out);
}
public static void main(String arg[])
{ writeFile out=new writeFile(new FileOutputStream(new
File("path_String")));
out.write("مكتبة".getByte());
}}
it is ok written as it is but when read :
enter code here
class readFile extends BufferedInputStream {
public readFile(InputStream In){
super(In);
}
public static void main(String arg[])
{ readFile in=new readFile(new FileInputStream(new
File("path_String")));
int c;
while((c=in.read()!=-1)
System.out.print((char)c);
}}
the result is not as in file as written before : ÙÙتبة
so is this mean in writing java uses UTF-8 encoding and when in reading uses another encoding ?
Upvotes: 0
Views: 1758
Reputation: 114488
The issue is not that it it not reading with UTF-8, it's that you are trashing the encoding in your read operation. FileInputStream.read()
is very clearly stated to read one byte at a time. Bytes converted to characters are not going to work if you have multi-byte sequences in your file (which you almost certainly do since it is in Arabic).
As you figured out, the easiest solution is to use InputStreamReader
, which reads the bytes from an underlying FileInputStream
(or other stream), and correctly decodes the character sequences. The default encoding here is of course the same as for the writer:
An
InputStreamReader
is a bridge from byte streams to character streams: It reads bytes and decodes them into characters using a specified charset. The charset that it uses may be specified by name or may be given explicitly, or the platform's default charset may be accepted.
You can do a similar thing by reading the entire file into a byte buffer and then decoding the entire thing using something like String(byte[])
. The results should be identical if you read the entire file because now the decoder will have enough information to correctly parse out all the multi-byte characters.
There is a reference on encoding and decoding that I found very useful in understanding the subject: http://kunststube.net/encoding/
Upvotes: 1