Federico Ponzi
Federico Ponzi

Reputation: 2775

Spark not using spark.sql.parquet.compression.codec

I'm comparing spark's parquets file vs apache-drill's. Drill's parquet are way more lightweight then spark's. Spark uses GZIP as compression codec as default, for experimenting I tried to change it to snappy : same size uncompressed: same size lzo : exception

I tried both ways:

sqlContext.sql("SET spark.sql.parquet.compression.codec=uncompressed")
sqlContext.setConf("spark.sql.parquet.compression.codec.", "uncompressed")

But seems like it dosen't change his settings

Upvotes: 11

Views: 22117

Answers (6)

Vijay Krishna
Vijay Krishna

Reputation: 1067

Try this. Seems to work for me in 1.6.0

val sc = new SparkContext(sparkConf)
val sqlContext = new HiveContext(sc)
sqlContext.setConf("spark.sql.parquet.compression.codec", "uncompressed")

Upvotes: 3

egonzalpe
egonzalpe

Reputation: 26

For Spark 1.6 : You can use different compression codecs. Try :

sqlContext.setConf("spark.sql.parquet.compression.codec","gzip")
sqlContext.setConf("spark.sql.parquet.compression.codec","lzo")    
sqlContext.setConf("spark.sql.parquet.compression.codec","snappy")
sqlContext.setConf("spark.sql.parquet.compression.codec","uncompressed")

Upvotes: 1

ruseel
ruseel

Reputation: 1734

Worked for me in 2.1.1

df.write.option("compression","snappy").parquet(filename)

Upvotes: 12

yogesh
yogesh

Reputation: 1

When facing issues while storing into Hive via hive context use:

hc.sql("set parquet.compression=snappy")

Upvotes: 0

BVR
BVR

Reputation: 21

For spark 1.3 and spark.sql.parquet.compression.codec parameter did not compress the output. But the below one did work.

sqlContext.sql("SET parquet.compression=SNAPPY")

Upvotes: 2

J Maurer
J Maurer

Reputation: 1044

Try:

sqlContext.setConf("spark.sql.parquet.compression.codec", "snappy")

I see you already did this, but I'm unable to delete my answer on mobile. Try setting this before the sqlcontext as suggested in the comment.

Upvotes: 0

Related Questions