Reputation: 1211
I want to upload a dataframe to a server as csv file with Gzip encoding without saving it on the disc.
It is easy to build some csv file with Gzip encoding using spark-csv
lib:
df.write
.format("com.databricks.spark.csv")
.option("header", "true")
.option("codec", "org.apache.hadoop.io.compress.GzipCodec")
.save(s"result.csv.gz")
But I have no idea how to get Array[Byte]
, representing my DataFrame
, which I can upload via HTTP
Upvotes: 1
Views: 73
Reputation: 1330
You could write to your remote server as a remote hdfs server, you'd need to have hdfs installed on your remote server but after that you should be able to do something like
df.write
.format("com.databricks.spark.csv")
.option("header", "true")
.option("codec", "org.apache.hadoop.io.compress.GzipCodec")
.save("hdfs://your_remote_server_hostname_or_ip/result.csv.gz")
Upvotes: 3