Spark RDD join with Cassandra Table

Question

I am joining Spark RDD with Cassandra table (lookup) but not able to understand a few things.

Will spark pull all records between range_start and range_end from Cassandra table and then join it with RDD in spark memory or it will push down all values from RDD to Cassandra and perform the join there
Where the limit(1) will be applied? (Cassandra or Spark)
Will Spark always pull same number of records from Cassandra no matter what limit is applied (1 or 1000)?

Code below :

//creating dataframe with fields required for join with cassandra table
//and converting same to rdd
val df_for_join = src_df.select(src_df("col1"),src_df("col2"))
val rdd_for_join = df_for_join.rdd

val result_rdd = rdd_for_join
.joinWithCassandraTable("my_keyspace", "my_table"
,selectedColumns = SomeColumns("col1","col2","col3","col4")
,SomeColumns("col1", "col2")
).where("created_at >''range_start'' and created_at<= range_end")
.clusteringOrder(Ascending).limit(1)

Cassandra table details -

PRIMARY KEY ((col1, col2), created_at) WITH CLUSTERING ORDER BY (created_at ASC)

Spark RDD join with Cassandra Table

Answers (1)

Related Questions