fokoenecke
fokoenecke

Reputation: 53

How to set row batch size for incrementalCollect in Apache Spark Thrift server?

I enabled spark.sql.thriftServer.incrementalCollect in my Thrift server (Spark 3.1.2) to prevent OutOfMemory exceptions. This worked fine, but my queries are really slow now. I checked the logs and found that Thrift is querying batches of 10.000 rows.

INFO SparkExecuteStatementOperation: Returning result set with 10000 rows from offsets [1260000, 1270000) with 169312d3-1dea-4069-94ba-ec73ac8bef80

My hardware would be able to handle 10x-50x of that. This issue and this documentation page suggest setting spark.sql.inMemoryColumnarStorage.batchSize, but that didn't work.

Is it possible to configure the value?

Upvotes: 3

Views: 302

Answers (1)

RockSolid
RockSolid

Reputation: 536

The spark.sql.inMemoryColumnarStorage.batchSize is for caching not for the fetchSize per incrementalload. Read the spark thrift code in open source repo to check exact usage.

Upvotes: 0

Related Questions