How to set row batch size for incrementalCollect in Apache Spark Thrift server?

Question

I enabled spark.sql.thriftServer.incrementalCollect in my Thrift server (Spark 3.1.2) to prevent OutOfMemory exceptions. This worked fine, but my queries are really slow now. I checked the logs and found that Thrift is querying batches of 10.000 rows.

INFO SparkExecuteStatementOperation: Returning result set with 10000 rows from offsets [1260000, 1270000) with 169312d3-1dea-4069-94ba-ec73ac8bef80

My hardware would be able to handle 10x-50x of that. This issue and this documentation page suggest setting spark.sql.inMemoryColumnarStorage.batchSize, but that didn't work.

Is it possible to configure the value?

How to set row batch size for incrementalCollect in Apache Spark Thrift server?

Answers (1)

Related Questions