Reputation: 2740
I am new to spark I want to save my spark data to cassandra with a condition that I have an RDD and I want to save data of this RDD into more he one table in cassandra?Is this possible if yes then how ?
Upvotes: 3
Views: 9386
Reputation: 29
Python pyspark Cassandra saveToCassandra Spark
Imagine your table is the following:
CREATE TABLE ks.test (
id uuid,
sampleId text,
validated boolean,
cell text,
gene text,
state varchar,
data bigint, PRIMARY KEY (id, sampleId) );
How you can update only the 'validated' field for a given sampleId in the test table in the keyspace ks ? You can use the following line to update the table in Python.
from pyspark import SparkConf
import pyspark_cassandra
from pyspark_cassandra import CassandraSparkContext
conf = SparkConf().set("spark.cassandra.connection.host", <IP1>).set("spark.cassandra.connection.native.port",<IP2>)
sparkContext = CassandraSparkContext(conf = conf)
rdd = sparkContext.parallelize([{"validated":False, "sampleId":"323112121", "id":"121224235-11e5-9023-23789786ess" }])
rdd.saveToCassandra("ks", "test", {"validated", "sample_id", "id"} )
Upvotes: 1
Reputation: 37435
Use the Spark-Cassandra Connector.
How to save data to cassandra: example from the docs:
val collection = sc.parallelize(Seq(("cat", 30), ("fox", 40)))
collection.saveToCassandra("test", "words", SomeColumns("word", "count"))
See the project and full documentation here: https://github.com/datastax/spark-cassandra-connector
Upvotes: 3