user398384
user398384

Reputation: 1204

java.nio.charset.MalformedInputException when reading a stream

I use the following code to read data. It throws java.nio.charset.MalformedInputException. The file I can open normally, but it does include non-ascii chars. Anyway I can fix this problem?

  Source.fromInputStream(stream).getLines foreach { line =>
    // store items on the fly
    lineParser(line.trim) match {
      case None => // no-op
      case Some(pair) => // some-op
    }   
  }   
  stream.close()

The stream construction code is here:

def getStream(path: String) = {
  if (!fileExists(path)) {
    None
  } else {
    val fileURL = new URL(path)
    val urlConnection = fileURL.openConnection
    Some(urlConnection.getInputStream())
  }
}

Upvotes: 9

Views: 6922

Answers (2)

Alex Cruise
Alex Cruise

Reputation: 7979

Jean-Laurent is likely completely right that Stream.fromInputStream is using an encoding that doesn't match your stream—likely the platform default, i.e. ISO8859-1 on Windows, UTF-8 on recent Linux distros, IIUC MacRoman on Macs... Since you got an encoding exception, it's likely that it was defaulting to UTF-8—since it's a fairly rigid scheme—and the file was some other encoding (most likely ISO8859-1).

Broadly, there's no way to tell a priori what character encoding was used to generate some bitstream—you need some out-of-band mechanism to communicate it. In the case of HTTP responses, you can often get it from the Content-Type header, but various web apps do it wrong sometimes. If the file is XML, it's common to claim an encoding in the Processing Instruction at the top. Some file formats specify a single standard encoding... It's all over the map really.

Your best bet, in the absence of any integration requirement, is to use UTF-8 explicitly everywhere, and don't rely on the platform default encoding.

Upvotes: 5

huynhjl
huynhjl

Reputation: 41646

Try Source.fromInputStream(stream)(io.Codec("UTF-8")) or whatever charset you need.

Upvotes: 15

Related Questions