Algorithman
Algorithman

Reputation: 1334

Read from GZIPInputStream to String without using Source

I am using Scala. I need to read a large gzip file and turn it into string. And I need to remove the first line. This is how I read the file:

val fis = new FileInputStream(filename)
val gz  = new GZIPInputStream(fis)

And then I tried with this Source.fromInputStream(gz).getLines.drop(1).mkString("") . But it causes out of memory error.

Therefore, I think of reading line by line and maybe put it into byte array. Then I can just convert it into a single String in the end.

But I have no idea how to do this. Any suggestion? Or any better method is also welcome.

Upvotes: 0

Views: 1336

Answers (1)

Artavazd Balayan
Artavazd Balayan

Reputation: 2423

If your gzipped file is huge, you can go with BufferedReader. Here is an example. It copies all chars from gzipped file to uncompressed, but it skips the first line.

import java.util.zip.GZIPInputStream
import java.io._
import java.nio.charset.StandardCharsets

import scala.annotation.tailrec
import scala.util.Try

val bufferSize = 4096
val pathToGzFile = "/tmp/text.txt.gz"
val pathToOutputFile = "/tmp/text_without_first_line.txt"
val charset = StandardCharsets.UTF_8

val inStream = new FileInputStream(pathToGzFile)
val outStream = new FileOutputStream(pathToOutputFile)

try {
  val inGzipStream = new GZIPInputStream(inStream)
  val inReader = new InputStreamReader(inGzipStream, charset)
  val outWriter = new OutputStreamWriter(outStream, charset)
  val bufferedReader = new BufferedReader(inReader)

  val closeables =  Array[Closeable](inGzipStream, inReader, 
    outWriter, bufferedReader)
  // Read first line, so copy method will not get this - it will be skipped
  val firstLine = bufferedReader.readLine()
  println(s"First line: $firstLine")

  @tailrec
  def copy(in: Reader, out: Writer, buffer: Array[Char]): Unit = {
    // Copy while it's not end of file
    val readChars = in.read(buffer, 0, buffer.length)
    if (readChars > 0) {
      out.write(buffer, 0, readChars)
      copy(in, out, buffer)
    }
  }

  // Copy chars from bufferReader to outWriter using buffer
  copy(bufferedReader, outWriter, Array.ofDim[Char](bufferSize))

  // Close all closeabes
  closeables.foreach(c => Try(c.close()))
}
finally {
  Try(inStream.close())
  Try(outStream.close())
}

Upvotes: 2

Related Questions