Shahbaz
Shahbaz

Reputation: 10565

How to query spark sql from a python app?

I have set a test Cassandra + Spark cluster. I am able to successfully query Cassandra from spark if I do the following:

import org.apache.spark.sql.cassandra.CassandraSQLContext
import import sqlContext.implicits._
val cc = new CassandraSQLContext(sc)
val dataframe = cc.sql("select * from my_cassandra_table") 
dataframe.first 

I would now like to query data from a python we app. All the docs on the web seem to show how to use spark's python shell (where the context, 'sc', is implicitly provided).

I need to be able to run spark SQL from an independent python script, perhaps one which serves web pages.

I haven't found any docs, no help on apache-spark irc channel. Am I just thinking about this wrong? Are there other tools which provide spark SQL to less technical users? I'm completely new to spark.

Upvotes: 0

Views: 1829

Answers (1)

Brian Clapper
Brian Clapper

Reputation: 26211

From the Spark Programming Guide:

The first thing a Spark program must do is to create a SparkContext object, which tells Spark how to access a cluster. To create a SparkContext you first need to build a SparkConf object that contains information about your application.

conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)

The appName parameter is a name for your application to show on the cluster UI. master is a Spark, Mesos or YARN cluster URL, or a special “local” string to run in local mode. In practice, when running on a cluster, you will not want to hardcode master in the program, but rather launch the application with spark-submit and receive it there. However, for local testing and unit tests, you can pass “local” to run Spark in-process.


You can then test your program with spark-submit.

Upvotes: 2

Related Questions