alortimor
alortimor

Reputation: 355

pyspark kernel with jupyter - Cannot find kernel

I am trying to use pyspark kernel in jupyter. I am new to both and have scoured around trying to get pyspark 2.1.0 working in jupyter.

I've installed pyspark 2.1.0 and anaconda3 on 64-bit Ubuntu 16.04 LTS. I've setup the following exports in .bashrc:

export SPARK_HOME=/usr/lib/spark
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"
export SBT_HOME=/usr/share/sbt-launcher-packaging/bin/sbt-launch.jar
PYTHONPATH=/usr/lib/spark/python/lib/py4j-0.10.4-src.zip
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
export PATH=$PATH:$SBT_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PATH=$PATH:/home/user1/course/research_methods/spin/Spin/Src6.4.6
export PYSPARK=/usr/lib/spark/bin
export PATH=$PATH:$PYSPARK

export PYSPARK_PYTHON=/home/user1/anaconda3/bin/python3
export PYSPARK_DRIVER_PYTHON=/home/user1/anaconda3/bin/jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

# added by Anaconda3 4.2.0 installer
export PATH="/home/user1/anaconda3/bin:$PATH"
export LD_LIBRARY_PATH=/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH

I've created the file "00-pyspark-setup.py" in ~/.jupyter/profile_spark/

import os
import sys

spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, spark_home + "/python")
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip'))

filename = os.path.join(spark_home, 'python/pyspark/shell.py')
exec(compile(open(filename, "rb").read(), filename, 'exec'))

spark_release_file = spark_home + "/RELEASE"

if os.path.exists(spark_release_file) and "Spark 2.1.0" in open(spark_release_file).read():
  pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
  if not "pyspark-shell" in pyspark_submit_args: 
    pyspark_submit_args += " pyspark-shell"
    os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

as recommended in this pyspark installation guide

When I run the script, it produces the following output:

$ ./00-pyspark-setup.py
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.1.0
      /_/

Using Python version 3.5.2 (default, Jul  2 2016 17:53:06)
SparkSession available as 'spark'.
$

When I open a .ipynb file in jupyter that has the following metadata:

 "metadata": {
  "kernelspec": {
   "display_name": "PySpark",
   "language": "python",
   "name": "pyspark"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
   "version": "3.5.2"
  }

I get the following error: "I couldn't find a kernel matching PySpark. Please select a kernel:" A "kernel" drop down list next to the error message only has the following two options "Python [conda root]" and "Python [default]". No pyspark option.

Can anybody suggest what I need to modify to make pyspark available ?

Thanks

Upvotes: 2

Views: 1365

Answers (1)

user7504365
user7504365

Reputation: 36

.bashrc -> py4j-0.10.4-src.zip

and

00-pyspark-setup.py -> py4j-0.8.2.1-src.zip

Using different py4j

Upvotes: 1

Related Questions