Reputation: 181
How to specify Parquet Block Size and Page Size in PySpark? I have searched everywhere but cannot find any documentation for the function calls or the import libraries.
Upvotes: 3
Views: 1230
Reputation:
According to spark-user archives
sc.hadoopConfiguration.setInt("dfs.blocksize", some_value)
sc.hadoopConfiguration.setInt("parquet.block.size", some_value)
so in PySpark
sc._jsc.hadoopConfiguration().setInt("dfs.blocksize", some_value)
sc._jsc.hadoopConfiguration().setInt("parquet.block.size", some_value)
Upvotes: 5