Reputation: 2589
I need to connect my standalone Spark to my Cassandra instance in python. I have downloaded Apache spark from the Apache website, extracted and built it as such:
tar -xvf spark-1.4.1.tgz
sbt/sbt assembly
I added updated the ./bashrc file and i can run Spark. I also have Cassandra set up where i can pull data from my python program.
How do i connect Spark to the Cassandra instance to access Cassandra tables as Spark RDDs ?
Upvotes: 1
Views: 3768
Reputation: 16576
A DataFrame compatible interface is available through the Spark Cassandra Connector https://github.com/datastax/spark-cassandra-connector/blob/master/doc/15_python.md
An RDD interface based on wrapping the Connector is available as well https://github.com/TargetHolding/pyspark-cassandra
In both cases you will end up adding the package/lib to your application via
--packages or --jars
and specifying your Cassandra connection host
--conf spark.cassandra.connection.host=yourhost
sqlContext.read\
.format("org.apache.spark.sql.cassandra")\
.options(table="kv", keyspace="test")\
.load().show()
sc.cassandraTable("keyspace", "table")
Upvotes: 1