Utkarsh
Utkarsh

Reputation: 181

Specify Parquet properties pyspark

How to specify Parquet Block Size and Page Size in PySpark? I have searched everywhere but cannot find any documentation for the function calls or the import libraries.

Upvotes: 3

Views: 1230

Answers (1)

user6022341
user6022341

Reputation:

According to spark-user archives

sc.hadoopConfiguration.setInt("dfs.blocksize", some_value)
sc.hadoopConfiguration.setInt("parquet.block.size", some_value)

so in PySpark

sc._jsc.hadoopConfiguration().setInt("dfs.blocksize", some_value)
sc._jsc.hadoopConfiguration().setInt("parquet.block.size", some_value)

Upvotes: 5

Related Questions