Reputation: 53
I enabled spark.sql.thriftServer.incrementalCollect
in my Thrift server (Spark 3.1.2) to prevent OutOfMemory exceptions. This worked fine, but my queries are really slow now. I checked the logs and found that Thrift is querying batches of 10.000 rows.
INFO SparkExecuteStatementOperation: Returning result set with 10000 rows from offsets [1260000, 1270000) with 169312d3-1dea-4069-94ba-ec73ac8bef80
My hardware would be able to handle 10x-50x of that.
This issue and this documentation page suggest setting spark.sql.inMemoryColumnarStorage.batchSize
, but that didn't work.
Is it possible to configure the value?
Upvotes: 3
Views: 302
Reputation: 536
The spark.sql.inMemoryColumnarStorage.batchSize is for caching not for the fetchSize per incrementalload. Read the spark thrift code in open source repo to check exact usage.
Upvotes: 0