Reputation: 1499
I am asking a question very similar to this SO question on pyspark and spark This answer explains that the pyspark installation does have spark in it. What happens when I do this through Anaconda ? And, is there any other way to run this in PyCharm ? Because, my jupyter notebooks run well with this.
I am very confused about Spark and Pyspark starting right from the installation.
I understand that PySpark is a wrapper to write scalable spark scripts using python. All I did was through anaconda, I installed it.
conda install pyspark
. I could import it in the script.
But, while I try to run scripts through PyCharm, these warnings come up and the code just stays as is, not stopped though.
Missing Python executable 'C:\Users\user\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Python 3.9', defaulting to 'C:\Users\user\AppData\Local\Programs\Python\Python39\Lib\site-packages\pyspark\bin\..' for SPARK_HOME environment variable. Please install Python or specify the correct Python executable in PYSPARK_DRIVER_PYTHON or PYSPARK_PYTHON environment variable to detect SPARK_HOME safely.
It clearly tells that these environment variables need to be set
There are a lot of resources on installing Spark and I went through many and followed this:
I just don't understand the link between all of this. This may be a very trivial question, but I am just feeling so helpless.
Thanks.
Upvotes: 3
Views: 3055
Reputation: 6082
(Over)simplify explanation: Spark is a data processing framework. The Spark core is implemented by Scala and Java, but it also provides different wrappers including Python (PySpark), R (SparkR), and SQL (Spark SQL).
You can install Spark separately (which would include all of the wrappers), or install Python version only by using pip
or conda
.
Upvotes: 3