Reputation: 2243
I am writing a program which searches for words in a text file (say B) in another dictionary text file (say A) to compare efficiencies of different sorting algorithms.
Anyway, my problem is when one of these source text files has a special character such as "µ." First of all, to save a text file with such a character in windows, notepad says I have to change the encoding from ANSI to something else like UTF-8.
My program crashes when it encounters a line with a special character. Specifically at the point when this word is compared to a word in the other dictionary text file using the compareTo method. It crashes with a NullPointerException.
I have printed out the special character to see that "µ" is represented as "µ" and strange characters are always present on the first line ("").
I am using a Scanner for file input:
inputStream = new Scanner (new FileInputStream(args[0]));
I have tried a FileReader as well
In general, how would I read special characters, or words containing special characters? And would these characters be compatiable with the built in compareTo method or would I have to find another way to order them?
Upvotes: 1
Views: 9539
Reputation: 109547
Do
inputStream = new Scanner(new FileInputStream(args[0]), "UTF-8");
or
BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(args[0]), "UTF-8"));
InputStreams are for binary byte data, Readers are on characters with their encoding.
It seems there is a "BOM" character in front of the text, a zero width space, which serves to mark the text as UTF-8. This could have been deleted, but then Windows does not recognize UTF-8. In the scanner you might wish to skip it.
Upvotes: 2