corydoras
corydoras

Reputation: 7270

Java read utf-8 encoded file, character by character

I have a file saved as utf-8 (saved by my application in fact). How do you read it character by character?

File file = new File(folder+name);
FileInputStream fis = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fis);
DataInputStream dis = new DataInputStream(bis);

The two options seem to be:

char c = dis.readByte()
char c = dis.readChar()

The original file is being written as follows:

File file = File.createTempFile("file", "txt");
FileWriter fstream = new FileWriter(file);
BufferedWriter out = new BufferedWriter(fstream);

Upvotes: 3

Views: 7526

Answers (4)

You should be aware that in the Java world you use streams to process bytes, and readers/writers to process characters. These two are not the same, and you should choose the right one to handle what you have.

Have a look at http://java.sun.com/docs/books/tutorial/i18n/text/stream.html to see how to work with characters in a byte-oriented world.

The Sun Java Tutorial is a highly recommended learning resource.

Upvotes: 4

dmazzoni
dmazzoni

Reputation: 13236

You don't want a DataInputStream, that's for reading raw bytes. Use an InputStreamReader, which lets you specify the encoding of the input (UTF-8 in your case).

Upvotes: 7

Tor Valamo
Tor Valamo

Reputation: 33749

You can read individual bytes and when you hit a byte that is less than 128 (ie. the 8th byte is 0) then that is the last byte of the character.

I'm no Java expert, but I would assume that there are better ways. Maybe some way of telling the reader what encoding it is in...

edit: see dmazzoni's answer.

Upvotes: -1

objects
objects

Reputation: 8677

Use a Reader (eg. BufferedReader)

Reader reader = new BufferedReader(new FileReader(file));

char c = reader.read();

Upvotes: 2

Related Questions