Jason Taylor
Jason Taylor

Reputation: 15

Install pyspark for beginner

I am currently doing pyspark courses in data camp, and now would like to start trying to build some of my own projects on my own computer using pyspark. However, I am becoming massively confused with the installation of spark/pysaprk itself and how to run it in jypter notebook.

I have looked vids on youtube with regards to install, like edurkea which seems to give an installation by creating a vm machine and connecting it to another which I do not want all I want is to install pysaprk on my laptop locally.

I have also followed the installation instructions from this link :

https://medium.com/@brajendragouda/installing-apache-spark-on-ubuntu-pyspark-on-juputer-ca8e40e8e655

And when I run the command pyspark in my terminal I get the no command response.

I have looked at the documentation on the spark site, which I find not very newbie friendly and was wondering if anyone has a link to an easy to follow guide for this install.

My current OS is ubuntu the latest version, I am just learning at the moment about using shell and bash scripts at the present but it all very new and a lot of the stuff I been looking at is starting to confuse me.

Any links, advice would be much appreactied.

Upvotes: 0

Views: 1793

Answers (2)

Matthew Son
Matthew Son

Reputation: 1425

I've tried installing pyspark in several ways, but the easiest way to install was using conda.

If you have anaconda (miniconda) installed on your laptop, try installing as below.

conda install pyspark
conda install -c anaconda openjdk  
#anaconda channel has v.8 and it works best, and do not install from conda-forge for it's version 11 and it crashes

Add SPARK_HOME variable : Modify this and copy paste it into .bashrc file.

export SPARK_HOME="/Users/YOUR_USER_NAME/miniconda3/lib/python3.7/site-packages/pyspark"

This worked for me. You probably want to install findspark which can be found in conda easily.

Upvotes: 1

David
David

Reputation: 11583

There is a docker pyspark image that makes the setup pretty easy. Here's a link describing the setup process. With docker installed & running, entering the following command line will launch a jupyter notebook environment in which you can run pyspark docker run -it -p 8888:8888 jupyter/pyspark-notebook.

This command will mount a temporary filesystem, though, which makes reading/saving data difficult. To point the environment to your filesystem, run docker run -it --rm -p 8888:8888 -p 4040:4040 -p 4041:4041 -v /Users/your/path:/home/jovyan jupyter/pyspark-notebook

Upvotes: 0

Related Questions