Ben Irving
Ben Irving

Reputation: 493

AWS EMR Spark 1.0

Is there a way to force Amazon EMR to use Spark 1.0.1? The current selectable versions stop at 1.4.1.

I am using the Alternating Least Squares implementation within MLlib, and since v1.1 they have implemented weighted regularization and for specific reasons (research study) I do not want this implementation, rather I am trying to access the non-weighted regularization version they had implemented in v1.0.

I am using Zepplin notebooks with Scala if that helps.

Upvotes: 0

Views: 283

Answers (3)

Jean-Marc S.
Jean-Marc S.

Reputation: 421

Is working with Zeppelin a requirement? Because if so, it could be very difficult. Zeppelin is compiled against a specific version of Spark so downgrading the jar will most likely fail.

Otherwise, if you are ok with not using Zeppelin and instead using the EMR Step API, then you might be able to spin up an EMR cluster with a bootstrap action that installs spark-assembly 1.0.1. I said it might work, because there's no guarantee that the current EMR version is compatible with a 2 year old version of Spark.

To create the cluster:

To run spark using the EMR Step API:

  • Upload your compiled jar to s3, then submit a step against that cluster
  • Cluster ID: the id of your cluster (ex j-XXXXXXXX)
  • Region of cluster. Where you created your EMR cluster. Ex us-west-2
  • Your spark main class: This is where you put your ml pipeline code.
  • Your jar: you have to upload the jar with your code to S3 so your cluster can download it
  • arg1, arg2: arguments to your main (optional)

aws emr add-steps --cluster-id --steps \ Name=SparkPi,Jar=s3://.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[/home/hadoop/spark/bin/spark-submit,--deploy-mode,cluster,--master,yarn,--class,com.your.spark.class.MainApp,s3://>/your.jar,arg1,arg2],ActionOnFailure=CONTINUE

(Taken from the official github repo at https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/examples/spark-submit-via-step.md)

Also if that fails, install Hadoop and check out https://spark.apache.org/docs/1.0.1/running-on-yarn.html

Or you could also run 1.0.1 locally on your laptop if your data is small.

Good luck.

Upvotes: 1

spew
spew

Reputation: 94

EMR supports Spark 1.6.0. Take a look at their latest release of emr-4.4.0: http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-whatsnew.html

Upvotes: 0

PinoSan
PinoSan

Reputation: 1508

Amazon EMR provide a list of supported versions of software packages you can install by selecting a drop menu. Nothing stop you from installing additional custom software with a bootstrap action. I had some experience installing java 8 when EMR was supporting only Java 7. It is a bit painful but totally possible.

Upvotes: 0

Related Questions