pnv
pnv

Reputation: 1499

What is the difference between PySpark and Spark?

I am asking a question very similar to this SO question on pyspark and spark This answer explains that the pyspark installation does have spark in it. What happens when I do this through Anaconda ? And, is there any other way to run this in PyCharm ? Because, my jupyter notebooks run well with this.

I am very confused about Spark and Pyspark starting right from the installation.

I understand that PySpark is a wrapper to write scalable spark scripts using python. All I did was through anaconda, I installed it.

conda install pyspark. I could import it in the script.

But, while I try to run scripts through PyCharm, these warnings come up and the code just stays as is, not stopped though.

Missing Python executable 'C:\Users\user\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Python 3.9', defaulting to 'C:\Users\user\AppData\Local\Programs\Python\Python39\Lib\site-packages\pyspark\bin\..' for SPARK_HOME environment variable. Please install Python or specify the correct Python executable in PYSPARK_DRIVER_PYTHON or PYSPARK_PYTHON environment variable to detect SPARK_HOME safely.

It clearly tells that these environment variables need to be set

There are a lot of resources on installing Spark and I went through many and followed this:

I just don't understand the link between all of this. This may be a very trivial question, but I am just feeling so helpless.

Thanks.

Upvotes: 3

Views: 3055

Answers (1)

pltc
pltc

Reputation: 6082

(Over)simplify explanation: Spark is a data processing framework. The Spark core is implemented by Scala and Java, but it also provides different wrappers including Python (PySpark), R (SparkR), and SQL (Spark SQL).

You can install Spark separately (which would include all of the wrappers), or install Python version only by using pip or conda.

Upvotes: 3

Related Questions