Reputation: 10565
I have set a test Cassandra + Spark cluster. I am able to successfully query Cassandra from spark if I do the following:
import org.apache.spark.sql.cassandra.CassandraSQLContext
import import sqlContext.implicits._
val cc = new CassandraSQLContext(sc)
val dataframe = cc.sql("select * from my_cassandra_table")
dataframe.first
I would now like to query data from a python we app. All the docs on the web seem to show how to use spark's python shell (where the context, 'sc', is implicitly provided).
I need to be able to run spark SQL from an independent python script, perhaps one which serves web pages.
I haven't found any docs, no help on apache-spark irc channel. Am I just thinking about this wrong? Are there other tools which provide spark SQL to less technical users? I'm completely new to spark.
Upvotes: 0
Views: 1829
Reputation: 26211
From the Spark Programming Guide:
The first thing a Spark program must do is to create a SparkContext object, which tells Spark how to access a cluster. To create a SparkContext you first need to build a SparkConf object that contains information about your application.
conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)
The appName
parameter is a name for your application to show on the cluster UI. master is a Spark, Mesos or YARN cluster URL, or a special “local” string to run in local mode. In practice, when running on a cluster, you will not want to hardcode master in the program, but rather launch the application with spark-submit and receive it there. However, for local testing and unit tests, you can pass “local” to run Spark in-process.
You can then test your program with spark-submit
.
Upvotes: 2