Reputation: 755
i am using java to parse a folder, and read the files. In the folder are only txt-files. But with different charsets. Some of them are in ISO-8859-1 and some of them are in windows-1252.
I need to read the file and create one single file from all. So i append the content. See my code:
File fiout = new File("single_"+System.currentTimeMillis()+".csv");
PrintWriter writer = new PrintWriter(fiout);
for( int x=0; x < all_zipEntries.size(); x++ ){
File fi = (File)all_zipEntries.get( x );
String zipfilename = fi.getName();
String charset = getCharset(fi);
Charset inputCharset = Charset.forName(charset);
log.println("Read "+zipfilename+" ... (Charset "+charset+" ... "+inputCharset.toString()+")");
FileInputStream fis = new FileInputStream(fi.getName());
InputStreamReader isr = new InputStreamReader(fis, inputCharset);
BufferedReader in = new BufferedReader(isr);
while ( in.ready() ) {
String row = in.readLine();
writer.println(row);
}
in.close();
isr.close();
fis.close();
}
writer.close();
This is my log:
Read 01.csv ... (Charset ISO-8859-1 ... ISO-8859-1)
Read 02.csv ... (Charset ISO-8859-1 ... ISO-8859-1)
Read 03.csv ... (Charset windows-1252 ... windows-1252)
Read 04.csv ... (Charset windows-1252 ... windows-1252)
Read 05.csv ... (Charset windows-1252 ... windows-1252)
Read 06.csv ... (Charset windows-1252 ... windows-1252)
Read 07.csv ... (Charset windows-1252 ... windows-1252)
Read 08.csv ... (Charset windows-1252 ... windows-1252)
Read 09.csv ... (Charset windows-1252 ... windows-1252)
You see the first 2 files are ISO coded, the last are windows-1252
My default charset is ISO-8859-1. In the result file that was createt by the code above i have some lines with
Äpfel
Äpfel
Äpfel
and i have lines like
?pfel
?pfel
The last one are from the files 3 till 9. It seems to me he did not convert from windows-1252 to ISO correctly. But i set the charset at reading!
Upvotes: 0
Views: 92