Reputation: 61
I'm trying to write ORC files using a streaming program, and I've enabled the ORC bloom filter on a couple of fields. writeStream().option("orc.bloom.filter.columns","col1, col2, col3"), for instance. No errors are occurring when writing to the file. Nevertheless, bloom filter is not being activated.
Could someone please clarify what is going wrong?
sample code : val sparkAlt = org.apache.spark.sql.SparkSession.builder().getOrCreate(); val df=sparkAlt.readStream.format("orc").option("path","/tmp/bintime=1705791601/").schema(schema).load(); df.writeStream.format("orc").option("orc.bloom.filter.columns","txnid").option("checkpointLocation","/path/checkpoint").option("path","outputpath/").start();
Upvotes: 0
Views: 27