Reputation: 5502
I was told today that when writing to a file the Encoding in which you write in doesn't matter. I don't know a lot about Encoding but this sounds reasonable considering Encoding is only for reading/viewing?
Does the Encoding in which bytes are read from a file matter? Is the Encoding there only for parsing/display?
ex.
var bytes = getFileBytes();
bytes.remove(new byte[] { 232, 211 });
anotherStream.writeBytes(bytes);
// I'm assuming that Encoding is irrelevant
Upvotes: 0
Views: 713
Reputation: 127881
Encoding does not matter when you are simply reading bytes from the file and are not trying to interpret these bytes as text. For example, you can safely ignore encoding if you want to, say, copy a file to another file or a file to a socket. Obviously, you also don't need an encoding if the stream contains binary data, e.g. a sequence of int
s in binary form. Your example is also perfectly valid, unless you do not understand 232
and 211
bytes as characters.
However, when you start interpreting some file (or any sequence of bytes, e.g. byte array) as text, you just can't ignore encoding, because bytes can be converted to characters only by the means of some encoding. Sure, it is usually possible not to specify an encoding when using something like FileReader
, however, in this case the encoding is specified implicitly, usually with your locale encoding as a default. Because of this it is better to always specify the encoding you intend to use when loading character data from byte streams (e.g. via InputStreamReader
), so the actual encoding would not depend on the system you're running your program on.
Upvotes: 1
Reputation: 189517
What I think somebody might have told you is that if you have to choose between encodings, it doesn't matter which one you pick as long as you stick to it.
This obviously ignores issues like the efficiency of the encoding (if one of them stores your typical data in fewer bytes, obviously use that then).
Consider the opposite scenario - you could write in one encoding and then either (a) forget about ever reading the data back in or (b) read the data incorrectly.
To use a contrived example, let's say you cannot use the letter lowercase i
in your data file for some reason. So to store that, you need to encode it somehow. You decide to store it as \48
. But now, how do you represent the literal sequence \48
unambiguously, should you ever need to? Ah ha, your encoding can accommodate that, too: store any literal backslash as \5C
. But of course, when you read the file back in, you have to decode this encoding, or you will end up with the wrong bytes. (ThÁ&s Á&s more common than you may thÁ&nk!)
Upvotes: 3