sequoia00
sequoia00

Reputation: 111

ToreeInstall ERROR | Unknown interpreter PySpark. toree can not install PySpark

When I install PySpark for Jupyter notebook, I using this cmd:

jupyter toree install  --kernel_name=tanveer --interpreters=PySpark --python="/usr/lib/python3.6"

But, I get the tips of

[ToreeInstall] ERROR | Unknown interpreter PySpark. Skipping installation of PySpark interpreter

So I don't know what a problem. I have set up Toree's Scala and SQL successfully. thinks

Upvotes: 9

Views: 2314

Answers (2)

Averell
Averell

Reputation: 843

As also mentioned in Lee's answer, Toree version 0.3.0 removed support for PySpark and SparkR. As per their release notes, they asked to "use specific kernels". For PySpark, this means manually install pyspark to be used with Jupyter.

Steps are simple as follow:

  1. Install pyspark. Either by pip install pyspark, or by download Apache Spark binary package and decompress into a specific folder.
  2. Add the following 3 environment variables. How to do this depends on your OS. For example, on my MacOS, I added the following lines to the file ~/.bash_profile

    export SPARK_HOME=<path_to_your_installed_spark_files>
    export PYSPARK_DRIVER_PYTHON="jupyter"
    export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
    

That's it. To start your PySpark Jupyter Notebook, simply run "pyspark" from your command line, and choose "Python" kernel

Refer to
https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781788835367/1/ch01lvl1sec17/installing-jupyter
or
https://opensource.com/article/18/11/pyspark-jupyter-notebook
for more detailed instructions.

Upvotes: 4

Lee
Lee

Reputation: 445

Toree version 0.3.0 removed support for PySpark and SparkR:

Removed support for PySpark and Spark R in Toree (use specific kernels)

Release notes here: incubator-toree release notes

I am not sure what "use specific kernels" means and continue to look for a Jupyter PySpark kernel.

Upvotes: 5

Related Questions