Makrushin Evgenii
Makrushin Evgenii

Reputation: 1211

How to upload a dataframe as a stream without saving on disc?

I want to upload a dataframe to a server as csv file with Gzip encoding without saving it on the disc.

It is easy to build some csv file with Gzip encoding using spark-csv lib:

df.write
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
    .save(s"result.csv.gz")

But I have no idea how to get Array[Byte], representing my DataFrame, which I can upload via HTTP

Upvotes: 1

Views: 73

Answers (1)

randal25
randal25

Reputation: 1330

You could write to your remote server as a remote hdfs server, you'd need to have hdfs installed on your remote server but after that you should be able to do something like

df.write
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
    .save("hdfs://your_remote_server_hostname_or_ip/result.csv.gz")

Upvotes: 3

Related Questions