Reputation: 2775
I'm comparing spark's parquets file vs apache-drill's. Drill's parquet are way more lightweight then spark's. Spark uses GZIP as compression codec as default, for experimenting I tried to change it to snappy : same size uncompressed: same size lzo : exception
I tried both ways:
sqlContext.sql("SET spark.sql.parquet.compression.codec=uncompressed")
sqlContext.setConf("spark.sql.parquet.compression.codec.", "uncompressed")
But seems like it dosen't change his settings
Upvotes: 11
Views: 22117
Reputation: 1067
Try this. Seems to work for me in 1.6.0
val sc = new SparkContext(sparkConf)
val sqlContext = new HiveContext(sc)
sqlContext.setConf("spark.sql.parquet.compression.codec", "uncompressed")
Upvotes: 3
Reputation: 26
For Spark 1.6 : You can use different compression codecs. Try :
sqlContext.setConf("spark.sql.parquet.compression.codec","gzip")
sqlContext.setConf("spark.sql.parquet.compression.codec","lzo")
sqlContext.setConf("spark.sql.parquet.compression.codec","snappy")
sqlContext.setConf("spark.sql.parquet.compression.codec","uncompressed")
Upvotes: 1
Reputation: 1734
Worked for me in 2.1.1
df.write.option("compression","snappy").parquet(filename)
Upvotes: 12
Reputation: 1
When facing issues while storing into Hive via hive context use:
hc.sql("set parquet.compression=snappy")
Upvotes: 0
Reputation: 21
For spark 1.3 and spark.sql.parquet.compression.codec parameter did not compress the output. But the below one did work.
sqlContext.sql("SET parquet.compression=SNAPPY")
Upvotes: 2
Reputation: 1044
Try:
sqlContext.setConf("spark.sql.parquet.compression.codec", "snappy")
I see you already did this, but I'm unable to delete my answer on mobile. Try setting this before the sqlcontext as suggested in the comment.
Upvotes: 0