Reputation: 6924
I understand that byte streams deal with bytes and character streams deal with characters... if I use a byte stream to read in characters, could this limit me to the sorts of characters I might read? For instance, bytes are read in as 8 bit bytes, characters are read in as 16 bit characters... does this mean that more characters can be represented using character streams rather than byte streams?
The last thing im confused about is how a byte stream writes out to a file for reading. If I was recieving bytes from a network socket, I would wrap them in a InputStreamReader
for writing, this way I would get the character transformation logic the character stream provides. If I read from a file using a FileInputStream
and write out using a FileOutputStream
, why is this file readable when I open it with a text editor? How is the FileOutputStream
treating the bytes?
Upvotes: 3
Views: 300
Reputation: 6452
I think you need to grasp the relation between a byte and a character in order to get your clarification.
The accepted answer to this question is quite clear IMHO : Why does a byte in Java I/O can represent a character?
I'd also check out byte stream and character stream
And if you don't want Joel to catch you and make you peel onions for 6 months in a submarine, just read http://www.joelonsoftware.com/articles/Unicode.html
Upvotes: 1
Reputation: 59576
To answer your questions:
I understand that byte streams deal with bytes and character streams deal with characters... if I use a byte stream to read in characters, could this limit me to the sorts of characters I might read?
Characters are not bytes. A character is store in one or more bytes according to the selected encoding scheme. The encoding scheme removes/extends the limit of sorts of characters you can read.
For instance, bytes are read in as 8 bit bytes, characters are read in as 16 bit characters... does this mean that more characters can be represented using character streams rather than byte streams?
In a way, yes.
The last thing im confused about is how a byte stream writes out to a file for reading. If I was recieving bytes from a network socket, I would wrap them in a InputStreamReader for writing, this way I would get the character transformation logic the character stream provides. If I read from a file using a FileInputStream and write out using a FileOutputStream, why is this file readable when I open it with a text editor? How is the FileOutputStream treating the bytes?
For bytes/data corresponding to characters, you should use OutputStreamWriter
for writing to a file and make it readable with a text editor. You can specify encoding at creation and the stream will perform the encoding of you text data.
Upvotes: 0
Reputation: 4812
All IO streams in java are just byte streams underneath. Byte to Character(and vice versa) conversions are done using encoding. But underneath it all, they are all bytes.
Upvotes: 0
Reputation: 340693
The key concept here is character encoding: each human readable character is somehow encoded into one or more bytes. There are plenty of character encodings. The most popular ones are:
These encodings are readable even when you open a file in hex editor. However there many character encodings that do not have this feature, namely UTF-16 and UTF-32.
Now back to your question: InputStream
only gives you a stream of bytes. If your bytes represent characters encoded with ASCII or UTF-8, most of the time you are fine. But if these bytes represent something more sophisticated like UTF-16, you absolutely need a Reader
. Of course the reader has to know which character encoding does the underlying InputStream
provide. This is often a problem done by the beginners - Reader
not initialized with character encoding explicitly will often fall back to system default.
Other way (with writers) is similar. If you simply cast your char
s to byte
s, most of the time you will be fine. But if your characters contain less popular national letters, your output will be malformed/truncated. So you create a Writer
that converts each given charater to a series of one or more bytes. Once again you are obligated to provide the character encoding.
Important rules:
InputStream
when dealing with binary data (multimedia, ZIP and PDF files, etc.)Reader
when reading text (txt, HTML, XML...)Upvotes: 3
Reputation: 36446
A char
is a 16 bit string that represents a Unicode character.
A byte
is an 8 bit string that represents a 2's complement number.
The important thing here is that they are both bit strings. Technically speaking, a char
is simply 2 byte
s. Nothing more, nothing less aside from some minor semantics with how Java treats the two. As far as the computer (or Input/OutputStream
s) are concerned, the only difference is the number of bits they hold.
Upvotes: 2