SeasonalShot
SeasonalShot

Reputation: 2589

Spark cassandra connector in Python

I need to connect my standalone Spark to my Cassandra instance in python. I have downloaded Apache spark from the Apache website, extracted and built it as such:

tar -xvf spark-1.4.1.tgz
sbt/sbt assembly

I added updated the ./bashrc file and i can run Spark. I also have Cassandra set up where i can pull data from my python program.

How do i connect Spark to the Cassandra instance to access Cassandra tables as Spark RDDs ?

Upvotes: 1

Views: 3768

Answers (1)

RussS
RussS

Reputation: 16576

A DataFrame compatible interface is available through the Spark Cassandra Connector https://github.com/datastax/spark-cassandra-connector/blob/master/doc/15_python.md

An RDD interface based on wrapping the Connector is available as well https://github.com/TargetHolding/pyspark-cassandra

In both cases you will end up adding the package/lib to your application via

--packages or --jars

and specifying your Cassandra connection host

--conf spark.cassandra.connection.host=yourhost

Dataframes (Requires the Spark Cassandra Connector)

 sqlContext.read\
    .format("org.apache.spark.sql.cassandra")\
    .options(table="kv", keyspace="test")\
    .load().show()

RDDs (Requires Pyspark-Cassandra)

sc.cassandraTable("keyspace", "table")

Upvotes: 1

Related Questions