Reputation: 643
Can anyone please help me on the below query. I have an RDD with 5 columns. I want to join with a table in Cassandra. I knew that there is a way to do that by using "joinWithCassandraTable"
I see somewhere a syntax to use it. Syntax: RDD.joinWithCassandraTable(KEYSPACE, tablename, SomeColumns("cola","colb")) .on(SomeColumns("colc"))
Can anyone please send me the correct syntax??
I would like to actually know where to mention the column name of a table which is a key to join.
Upvotes: 1
Views: 6284
Reputation: 71
Use below syntax for join in cassandra
joinedData = rdd.joinWithCassandraTable(keyspace,table).on(partitionKeyName).select(Column Names)
It will look something like this,
joinedData = rdd.joinWithCassandraTable(keyspace,table).on('emp_id').select('emp_name', 'emp_city')
Upvotes: 0
Reputation: 16576
JoinWithCassandraTable works by pulling only the partition keys which match your RDD entries from C* so it only works on partition keys.
The documentation is here https://github.com/datastax/spark-cassandra-connector/blob/master/doc/2_loading.md#using-joinwithcassandratable
and API Doc is here
The jWCT table method can be used without the fluent api by specifying all the arguments in the method
def joinWithCassandraTable[R](
keyspaceName: String,
tableName: String,
selectedColumns: ColumnSelector = AllColumns,
joinColumns: ColumnSelector = PartitionKeyColumns)
But the fluent api can also be used
joinWithCassandraTable[R](keyspace, tableName).select(AllColumns).on(PartitionKeyColumns)
These two calls are equivalent
Your example
RDD.joinWithCassandraTable(KEYSPACE, tablename, SomeColumns("cola","colb")) .on(SomeColumns("colc"))
Uses the Object from RDD
to join against colc
of tablename
and only returns cola
and colb
as join results.
Upvotes: 2