poeschlorn
poeschlorn

Reputation: 12440

MalformedByteSequenceException while trying to parse XML

I have the following .gpx data from wikipedia:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<gpx xmlns="http://www.topografix.com/GPX/1/1" creator="byHand" version="1.1" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">
  <wpt lat="39.921055008" lon="3.054223107">
    <ele>12.863281</ele>
    <time>2005-05-16T11:49:06Z</time>
    <name>Cala Sant Vicenç - Mallorca</name>
    <sym>City</sym>
  </wpt>
</gpx>

When I call my parsing method, I get an exception (see below). The call looks like this:

Document tmpDoc = getParsedXML(currentGPX);

My parsing method looks like this (standard parsing code, nothing exciting....):

    public static Document getParsedXML(String fileWithPath){
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    DocumentBuilder db;
    Document doc = null;
    try {
        db = dbf.newDocumentBuilder();
        doc = db.parse(new File(fileWithPath));
    } catch (ParserConfigurationException e) {
        e.printStackTrace();
    } catch (SAXException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
    return doc;
    }

This simple code throws following exception:

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 3-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at Zeugs.getParsedXML(Zeugs.java:38)
at Zeugs.main(Zeugs.java:25)

I guess the error lies within the format of the first file, but I don't know where exactly. Can you please give me a hint?

Upvotes: 1

Views: 6139

Answers (2)

Mmmh mmh
Mmmh mmh

Reputation: 5460

I had the same error report in one of my programs. But the error was only happening when running the jar in the Windows console. On linux or in eclipse (right-click on main class file > run as Java Application) the error was not occurring.

This is I guess because of the default encoding set on Windows (Cp..) vs UTF-8 on linux and in eclipse. To change the default when running the jar simply add the -Dfile.encoding=UTF8 parameter to the jvm

java -Dfile.encoding=UTF8 -jar myjar.jar

A reason why the program relies on this parameter could be that the encoding was not explicitly specified when using input stream or reader implementations.

Upvotes: 2

ChrisBD
ChrisBD

Reputation: 9209

I would suggest that your file hasn't been saved in UTF-8 format.

Upvotes: 5

Related Questions