Spark Cassandra Connector - Input Fetch Size

Question

I'm using Cassandra 3.11.4 and Spark 2.3.3. When I query lots of partition keys(for 3 months where minute is partition key = 3 * 30 * 24 * 60 partition keys) with joinWithCassandraTable, I'm seeing lots of slow timeout logs under cassandra debug.log like:

, time 591 msec - slow timeout 500 msec/cross-node

I'm using repartitionByCassandraReplica before joinWithCassandraTable.

I see that disk IO goes to 100%. If I change data model where hour goes as partition key instead of minute, large partitions will be created which is not applicable.

I suspect that this limit 5000 may cause that but even I set input.fetch.size_in_rows this log did not change.

sparkConf.set("spark.cassandra.input.fetch.size_in_rows", "20000");

How can i set this LIMIT 5000 clause ?

Spark Cassandra Connector - Input Fetch Size

Answers (1)

Related Questions