Jason Donnald
Jason Donnald

Reputation: 2316

How to upgrade Spark to newer version?

I have a virtual machine which has Spark 1.3 on it but I want to upgrade it to Spark 1.5 primarily due certain supported functionalities which were not in 1.3. Is it possible I can upgrade the Spark version from 1.3 to 1.5 and if yes then how can I do that?

Upvotes: 15

Views: 39175

Answers (3)

Faitus Joseph
Faitus Joseph

Reputation: 69

Below are the step by step instruction on how to upgrade Apache Spark to V3.4.

Step 1:

Go to AzSynapseSparkPool Powershell from the Azure Portal enter image description here

Step: 2:

Upgrade Apache Spark pool using Update-AzSynapseSparkPool powershell cmdlet as shown below.

Check the version of the Apache Spark:

get-AzSynapsesparkpool -WorkspaceName <Synapseworkspacename>

Update the version of the Spark:

update-AzSynapseSparkPool -WorkspaceName <Synapseworkspacename> -Name <SparkPoolName> -sparkversion 3.4

enter image description here

Upvotes: 1

Nabeel Ahmed
Nabeel Ahmed

Reputation: 19282

  1. Set your SPARK_HOME to /opt/spark
  2. Download the latest pre-built binary i.e. spark-2.2.1-bin-hadoop2.7.tgz - can use wget
  3. Create the symlink to the latest download - ln -s /opt/spark-2.2.1 /opt/spark
  4. Edit files in $SPARK_HOME/conf accordingly

For every new version you download just create the symlink to it (step 3)

ln -s /opt/spark-x.x.x /opt/spark

Upvotes: 3

desertnaut
desertnaut

Reputation: 60390

Pre-built Spark distributions, like the one I believe you are using based on another question of yours, are rather straightforward to "upgrade", since Spark is not actually "installed". Actually, all you have to do is:

  • Download the appropriate Spark distro (pre-built for Hadoop 2.6 and later, in your case)
  • Unzip the tar file in the appropriate directory (i.e.where folder spark-1.3.1-bin-hadoop2.6 already is)
  • Update your SPARK_HOME (and possibly some other environment variables depending on your setup) accordingly

Here is what I just did myself, to go from 1.3.1 to 1.5.2, in a setting similar to yours (vagrant VM running Ubuntu):

  1. Download the tar file in the appropriate directory

    vagrant@sparkvm2:~$ cd $SPARK_HOME vagrant@sparkvm2:/usr/local/bin/spark-1.3.1-bin-hadoop2.6$ cd .. vagrant@sparkvm2:/usr/local/bin$ ls ipcluster ipcontroller2 iptest ipython2 spark-1.3.1-bin-hadoop2.6 ipcluster2 ipengine iptest2 jsonschema ipcontroller ipengine2 ipython pygmentize vagrant@sparkvm2:/usr/local/bin$ sudo wget http://apache.tsl.gr/spark/spark-1.5.2/spark-1.5.2-bin-hadoop2.6.tgz [...] vagrant@sparkvm2:/usr/local/bin$ ls ipcluster ipcontroller2 iptest ipython2 spark-1.3.1-bin-hadoop2.6 ipcluster2 ipengine iptest2 jsonschema spark-1.5.2-bin-hadoop2.6.tgz ipcontroller ipengine2 ipython pygmentize

Notice that the exact mirror you should use with wget will be probably different than mine, depending on your location; you will get this by clicking the "Download Spark" link in the download page, after you have selected the package type to download.

  1. Unpack the tgz file with

    vagrant@sparkvm2:/usr/local/bin$ sudo tar -xzf spark-1.*.tgz vagrant@sparkvm2:/usr/local/bin$ ls ipcluster ipcontroller2 iptest ipython2 spark-1.3.1-bin-hadoop2.6 ipcluster2 ipengine iptest2 jsonschema spark-1.5.2-bin-hadoop2.6 ipcontroller ipengine2 ipython pygmentize spark-1.5.2-bin-hadoop2.6.tgz

You can see that now you have a new folder, spark-1.5.2-bin-hadoop2.6.

  1. Update accordingly SPARK_HOME (and possibly other environment variables you are using) to point to this new directory instead of the previous one.

And you should be done, after restarting your machine.

Notice that:

  1. You don't need to remove the previous Spark distribution, as long as all the relevant environment variables point to the new one. That way, you may even quickly move "back-and-forth" between the old and new version, in case you want to test things (i.e. you just have to change the relevant environment variables).
  2. sudo was necessary in my case; it may be unnecessary for you depending on your settings.
  3. After ensuring that everything works fine, it's good idea to delete the downloaded tgz file (see below why).
  4. You can use the exact same procedure to upgrade to future versions of Spark, as they come out (rather fast). If you do this, either make sure that previous tgz files have been deleted, or modify the tar command above to point to a specific file (i.e. no * wildcards as above).

Upvotes: 23

Related Questions