Reputation: 1230

Strange character in the begining xml file

I´m trying to parse one xml but it shows a error, if I put a system.out.println to the String i see it.

before

ï»¿<?xml version="1.0"

after

?<?xml version="1.0"

I´m changing the charset to UTF-8 but didn´t works, so, what should I do?

Upvotes: 4

Answers (3)

Diego Macario

Reputation: 1230

For someone who wants to parse a xml and is having some problem with parse because of BOM this code above worked to me.

You can use API from apache BomInpustStream, it does the job for you, I had this problem, and you can trust, using this API will be much easier. A tip for you when parse a XML, you will need to get this as a array of bytes, check with the API suggested, and later parse to String in the charset UTF-8, in this way you will not lost the accents..

Piece of code to transform a source in inputStream

String source = FileUtil.takeOffBOM(IOUtils.toInputStream(attachment.getValue()));

Method to take off the BOM

public static String takeOffBOM(InputStream inputStream) throws IOException {
    BOMInputStream bomInputStream = new BOMInputStream(inputStream);
    return IOUtils.toString(bomInputStream, "UTF-8");
}

Upvotes: 3

Josh

Reputation: 1553

You have a UTF-8 string (which is why Notepad++ is recognizing it as such), but UTF-8 doesn't require a BOM. Some programs produce it; some don't. This leads to occasional confusion when reading files - some readers (like the one you're using in your Java code) don't recognize and ignore it. I'd recommend something like the accepted answer to this question or this one for removing it. Make sure you implement a check to determine if the first 3 bytes actually are a BOM before removing them from all incoming strings.

Upvotes: 4

user2987828

Reputation: 1137

A lot of utilities produce such initial odd character.

You may use java code to skip any character before the first "<". If your xml file is yours, you can fix it for good with, for example:

vi # no filename here, we need first to get in binary mode.
:set binary
:e filename.containing.your.xml
dt<:w
:q!

Upvotes: 1

Strange character in the begining xml file

Answers (3)

Related Questions