How to upload a dataframe as a stream without saving on disc?

Question

I want to upload a dataframe to a server as csv file with Gzip encoding without saving it on the disc.

It is easy to build some csv file with Gzip encoding using spark-csv lib:

df.write
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
    .save(s"result.csv.gz")

But I have no idea how to get Array[Byte], representing my DataFrame, which I can upload via HTTP

How to upload a dataframe as a stream without saving on disc?

Answers (1)

Related Questions