Reputation: 1982
So I'm saving a spark RDD to a S3 bucket using following code. Is there a way to compress(in gz format) and save instead of saving it as a text file.
help_data.repartition(5).saveAsTextFile("s3://help-test/logs/help")
Upvotes: 9
Views: 6838
Reputation: 330163
saveAsTextFile
method takes an optional argument which specifies compression codec class:
help_data.repartition(5).saveAsTextFile(
path="s3://help-test/logs/help",
compressionCodecClass="org.apache.hadoop.io.compress.GzipCodec"
)
Upvotes: 15