Sembrano
Sembrano

Reputation: 1147

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence (XML)

I am getting this Exceptoin and the problem is because I input characters that are utf-8 encoded.

At the top of my XML file is say : <?xml version="1.0" encoding="UTF-8" ?> And I have added encoding. but stil I get this exception.

 //Set the format
    Format format = Format.getPrettyFormat();
    format.setEncoding("UTF-8");
    XMLOutputter xmlOutput = new XMLOutputter(format);
    // Create a new file and write XML to it
    xmlOutput.output(doc, new FileOutputStream(new File(XMLEditorService
            .getXMLEditorService().getFile())));

The error seems to occur when I parse the file :

Document xmlDocument = builder.parse(file);

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
    at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.peekChar(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)

How to solve this?

Upvotes: 2

Views: 11365

Answers (2)

Spenhouet
Spenhouet

Reputation: 7159

I hat the same problem. My problem was that i created a new XML file with jdom and the FileWriter(xmlFile). The FileWriter is not able to create a UTF-8 File. Instead useing the FileOutputStream(xmlFile) solved it.

Upvotes: 1

Michael Kay
Michael Kay

Reputation: 163262

You're telling the parser that the file is encoded in UTF-8, and the parser is telling you that it isn't. I'm inclined to believe the parser.

There are two approaches to diagnosis:

(a) examine the file at the binary level to see what the actual octets are, and what the actual encoding is.

(b) study how the file came into being and how badly-encoded characters might have come to be there.

Upvotes: 1

Related Questions