Reputation: 1113
I want to migrate my old cassandra cluster data to a new cluster and thinking to write some spark jobs to do that. Is there any way to interact with multiple cassandra cluster from the same SparkContext. So that i can read the data from one cluster and write to another cluster using saveToCassandra function inside the same sparkJob.
val products = sc.cassandraTable("first_cluster","products").cache()
products.saveToCassandra("diff_cluster","products2")
Can we save the data into a different cluster ?
Upvotes: 1
Views: 963
Reputation: 6218
Example from spark-cassandra-connector
import com.datastax.spark.connector._
import com.datastax.spark.connector.cql._
import org.apache.spark.SparkContext
def twoClusterExample ( sc: SparkContext) = {
val connectorToClusterOne = CassandraConnector(sc.getConf.set("spark.cassandra.connection.host", "127.0.0.1"))
val connectorToClusterTwo = CassandraConnector(sc.getConf.set("spark.cassandra.connection.host", "127.0.0.2"))
val rddFromClusterOne = {
// Sets connectorToClusterOne as default connection for everything in this code block
implicit val c = connectorToClusterOne
sc.cassandraTable("ks","tab")
}
{
//Sets connectorToClusterTwo as the default connection for everything in this code block
implicit val c = connectorToClusterTwo
rddFromClusterOne.saveToCassandra("ks","tab")
}
}
Upvotes: 5