Reputation: 1
I'm trying to write a simple character to a file and read it back in. Writing the character to the file appears to work fine (at least as it appears in a hex editor). When I read the character back into memory, its a completely different value altogether. Here's my example code:
public class myclass {
public static void main(String[] args) {
char myChar = 158; // let myChar = 158
System.out.println("myChar = "+(int)myChar); // prints 158. Good.
try {
FileOutputStream fileOut = new FileOutputStream("readthis");
fileOut.write(myChar);
fileOut.close();
} catch (IOException e) {
System.exit(1);
}
// If I examine the "readthis" file, there is one byte that has a value of
// of '9E' or 158. This is what I'd expect.
// Lets try to now read it back into memory
char readChar = 0;
try {
int i = 0;
FileInputStream fstream = new FileInputStream("readthis");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
readChar = (char)br.read();
in.close();
} catch (IOException e) {
System.exit(1);
}
// Now, if we look at readChar, it's some value that's not 158!
// Somehow it got read into as 382!
// Printing this value results in 382
System.out.println("readChar = "+(int)readChar);
}
}
My question is, how did this happen? I would like readChar to equal its original value that I wrote (158), but I'm not sure what I'm doing wrong. Any help would be appreciated. Thanks.
Upvotes: 0
Views: 414
Reputation: 171
If you only want to write/read chars, please try DataOutputStream#writeChar()
and DataInputStream#readChar()
, but InputStreamRead/OutputStreamWriter is more flexible.
Upvotes: 1
Reputation: 328714
EJP is right. The longer explanation: The character has two properties and you're omitting one: The encoding.
This means that char myChar = 158
assigns myChar
the Unicode code point 158 (this isn't a printable character in Unicode).
When you write that to a file as a byte (using fileOut.write(int)
), you're converting the Unicode character to the integer 158
- the encoding is lost. The write()
method will strip anything but the lower 8 bits from the integer (write(158+256)
yields the same result as write(158)
).
When you read the data in again, you're using a Reader
which reads bytes and converts them into Unicode characters. To do this correctly, you need to specify the encoding with which the data was written. Since you didn't specify anything explicitly, Java uses the platform default encoding (the default for your OS).
So the reader reads 158
, and uses the default encoding to turn that into a char
.
To fix this, always use Reader
/Writer
along with InputStreamReader
and OutputStreamWriter
which allow you to specify which encoding to use. UTF-8
is a good choice since all Java VMs can read them and all Unicode characters can be translated into/from this encoding.
Upvotes: 3
Reputation: 311008
You are writing bytes and reading chars. Use a Writer
and a Reader
, or an OutputStream
and an InputStream
.
Upvotes: 5