Jupyter & PySpark: How to run multiple notebooks

Question

I am using Spark 1.6.0 on three VMs, 1x Master (standalone), 2x workers w/ 8G RAM, 2CPU each.

I am using the kernel configuration below:

{
 "display_name": "PySpark ",
 "language": "python3",
 "argv": [
  "/usr/bin/python3",
  "-m", 
  "IPython.kernel", 
  "-f",
  "{connection_file}"
 ],
 "env": {
  "SPARK_HOME": "/spark-1.6.0",
  "PYTHONSTARTUP": "/spark-1.6.0/python/pyspark/shell.py",
  "PYSPARK_SUBMIT_ARGS": "--master spark://:7077  --conf   spark.executor.memory=2G pyspark-shell --driver-class-path /opt/vertica/java/lib/vertica-jdbc.jar"
 }  
}

Currently, this works. I can use spark context sc & sqlContext without import, as in pyspark shell.

Problem comes when I use multiple notebooks: On my spark master I see two 'pyspark-shell' apps, which kinda make sense, but only one can run at a time. But here, 'running' does not mean executing anything, even when I do not run anything on a notebook, this will be shown as 'running'. Given this, I can't share my resources between notebooks, which is quite sad (i currently have to kill the first shell (= notebook kernel) to run the second).

If you have any ideas about how to do it, tell me! Also, I'm not sure if the way i'm working with kernels is 'best practice', i already had trouble just setting spark & jupyter to work together.

Thx all

Jupyter & PySpark: How to run multiple notebooks

Answers (1)

Related Questions

Jupyter &amp; PySpark: How to run multiple notebooks

Answers (1)

Related Questions

Jupyter & PySpark: How to run multiple notebooks