Abhimanyu
Abhimanyu

Reputation: 2740

How to use saveTOCassandra()

I am new to spark I want to save my spark data to cassandra with a condition that I have an RDD and I want to save data of this RDD into more he one table in cassandra?Is this possible if yes then how ?

Upvotes: 3

Views: 9386

Answers (2)

Rajesh
Rajesh

Reputation: 29

Python pyspark Cassandra saveToCassandra Spark

Imagine your table is the following:

CREATE TABLE ks.test (
  id uuid,
  sampleId text,
  validated boolean,
  cell text,
  gene text,
  state varchar,
  data bigint, PRIMARY KEY (id, sampleId) );

How you can update only the 'validated' field for a given sampleId in the test table in the keyspace ks ? You can use the following line to update the table in Python.


from pyspark import SparkConf

import pyspark_cassandra

from pyspark_cassandra import CassandraSparkContext

conf = SparkConf().set("spark.cassandra.connection.host", <IP1>).set("spark.cassandra.connection.native.port",<IP2>)

sparkContext = CassandraSparkContext(conf = conf)

rdd = sparkContext.parallelize([{"validated":False, "sampleId":"323112121", "id":"121224235-11e5-9023-23789786ess" }])

rdd.saveToCassandra("ks", "test", {"validated", "sample_id", "id"} )

Upvotes: 1

maasg
maasg

Reputation: 37435

Use the Spark-Cassandra Connector.

How to save data to cassandra: example from the docs:

val collection = sc.parallelize(Seq(("cat", 30), ("fox", 40)))
collection.saveToCassandra("test", "words", SomeColumns("word", "count"))

See the project and full documentation here: https://github.com/datastax/spark-cassandra-connector

Upvotes: 3

Related Questions