Reputation: 54619
Real simple question really. I need to read a Unicode text file in a Java program.
I am used to using plain ASCII text with a BufferedReader FileReader combo which is obviously not working :(
I know that I can read a String in the 'traditional' way using a Buffered Reader and then convert it using something like:
temp = new String(temp.getBytes(), "UTF-16");
But is there a way to wrap the Reader in a 'Converter'?
EDIT: the file starts with FF FE
Upvotes: 15
Views: 64579
Reputation: 93143
Check https://docs.oracle.com/javase/1.5.0/docs/api/java/io/InputStreamReader.html.
I would read source file with something like:
Reader in = new InputStreamReader(new FileInputStream("file"), "UTF-8"));
Upvotes: 10
Reputation: 21
I just had to add "UTF-8" to the creation of the InputStreamReader and special characters could be seen inmediately.
InputStreamReader istreamReader = new InputStreamReader(inputStream,"UTF-8");
BufferedReader bufferedReader = new BufferedReader(istreamReader);
Upvotes: 0
Reputation: 1
String s = new String(Files.readAllBytes(Paths.get("file.txt")),"UTF-8");
Upvotes: -1
Reputation: 3106
I would recommend to use UnicodeReader from Google Data API, see this answer for a similar question. It will automatically detect encoding from the Byte order mark (BOM).
You may also consider BOMInputStream in Apache Commons IO which does basically the same but does not cover all alternative versions of BOM.
Upvotes: 2
Reputation: 1
Scanner scan = new Scanner(new File("C:\\Users\\daniel\\Desktop\\Corpus.txt"));
while(scan.hasNext()){
System.out.println(scan.nextLine());
}
Upvotes: -1
Reputation: 108889
Some notes:
Upvotes: 7
Reputation: 8677
you wouldn't wrap the Reader, instead you would wrap the stream using an InputStreamReader. You could then wrap that with your BufferedReader that you currently use
BufferedReader in = new BufferedReader(new InputStreamReader(stream, encoding));
Upvotes: 18