user220755
user220755

Reputation: 4446

Setting the encoding on an inputstream

I'm processing xml in Java and I have the following code:

  dbf.setValidating(false);
  dbf.setIgnoringComments(false);
  dbf.setIgnoringElementContentWhitespace(true);
  dbf.setNamespaceAware(true);

  DocumentBuilder db = null;
  db = dbf.newDocumentBuilder();
  db.setEntityResolver(new NullResolver());
  _logger.error("Before processing the input stream");
  processXml(db.parse(is));

Where (is) is an InputStream.

This is resulting in the error:

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8

Which sounds like an error resulting from getting the wrong encoding. I would like to set the encoding on the InputStream but I am not sure how. I found ways to set the encoding on an InputSource or an InputStreamReader but then the db.parse does not take a reader/InputSource.

What is the best way to fix this?

Thanks!

Upvotes: 1

Views: 1381

Answers (1)

Don Roby
Don Roby

Reputation: 41155

DocumentBuilder.parse can take an InputSource. See the javadocs.

So you should try wrapping your InputStream in an InputReader (where you can specify the character set) and then create an InputSource based on that.

It's a bit convoluted, but these things happen in Java.

Something along the lines of

Upvotes: 2

Related Questions