B.Mr.W.
B.Mr.W.

Reputation: 19628

Read application configuration from Sparkcontext Object

I am developing a Spark application using pyspark shell.

I kickstarted the iPython notebook service using the command below, see here how I created the profile:

IPYTHON_OPTS="notebook --port 8889 --profile pyspark" pyspark

Based on the documentation, there is a sc spark context object already created for me with some default configuration.

"In the PySpark shell, a special interpreter-aware SparkContext is already created for you, in the variable called sc. Making your own SparkContext will not work."

I basically have two questions here:

(1) How can I get a summary of the configuration for the default sc object? I want to know how much memory has been allocated, how many cores I can use...etc. However, I only found a method called getLocalProperty for object sc from pyspark API without knowing what is the key argument that I should call.

(2) Is it possible to modify the sparkcontext working with iPythonnotebook. If you cannot modify the configurations once you started the iPython notebook, if there a file somewhere to configure the sc somewhere?

I am fairly new to Spark, the more information(resource) you can provide, the better it would be. Thanks!

Upvotes: 2

Views: 1204

Answers (1)

WestCoastProjects
WestCoastProjects

Reputation: 63062

It is not required to use pyspark: you can import the pyspark classes and then instantiate the SparkContext yourself

from pyspark import SparkContext, SparkConf

Set up your custom config:

conf = SparkConf().setAppName(appName).setMaster(master)
# set values into conf here ..
sc = SparkContext(conf=conf)

You may also want to look at the general spark-env.sh

conf/spark-env.sh.template # copy to conf/spark-env.sh and then modify vals as useful to you

eg. some of the values you may customize:

# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
# - SPARK_CLASSPATH, default classpath entries to append

Upvotes: 0

Related Questions