Clive DaSilva
Clive DaSilva

Reputation: 31

Problems after conda install PySpark on Windows 10

From a Udemy course about a year ago, I installed PySpark(ver 1.1) I think on my Windows 10 laptop, using it with Jupyter Notebook. A year later, I had to re-install Anaconda 3, etc and everything seemed to work fine except except running spark commands. I installed Pyspark with the following command : conda install -c conda-forge pyspark. Now I try to use all my Udemy scripts and I get the following:

Exception  Traceback (most recent call last) <ipython-input-5-03dc2d316f89> in <module>()1 sc = SparkSession.builder.appName('Basics').getOrCreate()

~\Anaconda3\lib\site-packages\pyspark\sql\session.py in getOrCreate(self)

167                     for key, value in self._options.items():
168                         sparkConf.set(key, value)
169                     sc = SparkContext.getOrCreate(sparkConf)
170                     # This SparkContext may be an existing one.
171                     for key, value in self._options.items():

I installed the latest PySpark ver 2.2.0, and I basically this same question with a slew of confusing responses. As I indicated , I did run an older version of PySpark on this Win 10 box a year ago.

Any ideas or hints?

Upvotes: 2

Views: 1799

Answers (1)

desertnaut
desertnaut

Reputation: 60318

Pyspark from PyPi or Anaconda (i.e. installed with pip or conda) does not contain the full Pyspark functionality; it is only intended for use with a Spark installation in an already existing cluster, in which case one may want to avoid downloading the whole Spark distribution locally. From the PyPi docs (this info should be in the Anaconda Cloud, too, but unfortunately it is not):

The Python packaging for Spark is not intended to replace all of the other use cases. This Python packaged version of Spark is suitable for interacting with an existing cluster (be it Spark standalone, YARN, or Mesos) - but does not contain the tools required to setup your own standalone Spark cluster. You can download the full version of Spark from the Apache Spark downloads page.

So, what you should do is download the complete Spark distribution (of which Pyspark is an integral part) from the above link. Certainly, this is exactly what you did in the past, since the pip/conda option became available only recently in Spark 2.1.

Upvotes: 2

Related Questions