user2552806
user2552806

Reputation:

Is joinWithCassandraTable() lazy?

I`m using Spark 1.2.1 with spark-cassandra-connector :

//join with cassandra
val rdd = some_array.map(x => SomeClass(x._1,x._2)).joinWithCassandraTable(keyspace, some_table)
println(timer, "Join")

//get only the jsons and create rdd temp table
val jsons = rdd.map(_._2.getString("this"))
val jsonSchemaRDD = sqlContext.jsonRDD(jsons)
jsonSchemaRDD.registerTempTable("this_json")
println(timer, "Map")

The output is:

Timer "Join"- 558 ms
Timer "Map"- 290284 ms

I guess the "joinWithCassandraTable()" function is lazy, if so, what is fire it up?

Upvotes: 3

Views: 247

Answers (1)

zero323
zero323

Reputation: 330123

Actually the part which will trigger an evaluation here is sqlContext.jsonRDD. Since you don't provide the schema it has to materialize jsons to be able to infer it.

joinWithCassandraTable is is kind of similar since it has to connect to the Cassandra and fetch required metadata. See Apache Spark: Driver (instead of just the Executors) tries to connect to Cassandra

Upvotes: 4

Related Questions