Reputation: 48091
I have an UTF-8 file (it's a csv).
I need to read line by line this file do some replace and then write line by line into another file.
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(fileFix), "ASCII")
);
bw.write(""); //clean current file
BufferedReader br = new BufferedReader(new InputStreamReader(
new FileInputStream(file),"UTF-8")
);
String line;
while ((line = br.readLine()) != null) {
line = line.replace(";", ",");
bw.append(line + "\n");
}
Simple as that.
The problem is that the output file (fileFix) is UTF-8 and i think it has the BOM character.
How can I write the file as plain ANSI without the BOM?
The error I am getting while reading my file with a software (weka)
The first line of this file:
Consider that notepad++ tells me the charset is UTF-8. If i try to convert this file in plain ASCII (with windows notepad), that chars disappers
When you are on the first line run:
line = line.substring(1);
To remove any BOM char.
Upvotes: 2
Views: 7530
Reputation: 3696
Look at http://en.wikipedia.org/wiki/Byte_order_mark for the pattern to replace, looks like EF BB BF rather than FE FF
This solution is wrong check Jons answer intsead
Upvotes: 1
Reputation: 1499790
It sounds like this is a BOM issue rather than an encoding issue as such.
You can just remove any BOM characters as you write the file, with:
line = line.replace("\ufeff", "");
That leaves the question of whether you're reading the data accurately in the first place... I'd strongly advise you not to use FileWriter
and FileReader
at all - instead, use InputStreamReader
and OutputStreamWriter
, specifying the encoding explicitly for both of them. Set the reader encoding to UTF-8 (assuming the input file really is UTF-8), and set the writer encoding to whatever you want... but I'd recommend sticking with UTF-8, to be honest.
Also note that you should be closing your reader/writer in finally
blocks, or using the try-with-resources statement if you're using Java 7.
Upvotes: 5