Streaming download multiple files from S3 as zip through Akka HTTP or Play

Question

I have an S3 structure that's the result of a Spark job that writes partitioned CSV files like below.

bucketA
  output
    cleaned-data1
      part000....csv
      part001....csv
      part002....csv
    cleaned-data2
      .....

What I need is to be able have an Akka HTTP endpoint that points to the output file name to download all parts as a zip file: https://..../download/cleaned-data1.

When this endpoint is called, ideally I want to:

Open a zip stream from the server to the client browser
Open up the part files and stream the bytes into the zip stream directly to the client without any buffering on the server to avoid memory issue

The total size of all parts can get up to 30GB uncompressed.

Is there a way to do this through Akka Stream, Akka HTTP or Play? Can I utilize the Alpakka library?

Edited temporary based on Ramon's answer:

  def bucketNameToFileContents(bucket : String) : Source[ByteString, _] =
    bucketNameToKeySource(bucket)
      .map(key => S3.download(bucket, key))
      .map(x => x.map(y => y.fold(Source.empty[ByteString])(_._1)))
      .flatMapConcat(identity)
      .flatMapConcat(identity)

Streaming download multiple files from S3 as zip through Akka HTTP or Play

Answers (1)

Related Questions