Specify options while saving Spark DataFrame as Parquet

Question

I use MongoSpark to read JSON-data from a MongoDB database as a Spark DataFrame. Now I want to write the JSON-data residing in the DataFrame as Parquet-files and that works like a charm. However, I'm struggling to set compression related options for the generated Parquet-files. I'd like to use Snappy as the codec and also would like to generate "larger" files by specifying the block size for the generated Parquet-files. I don't know how many different approaches I've tested so far but they're numerous. I thought this would be a straightforward thing to do by just "chaining" some .option(...) statements to the DataFrame.write() method but so far I've been unsuccessful in my efforts.

What am I doing wrong here?

Specify options while saving Spark DataFrame as Parquet

Answers (1)

Related Questions