Reputation: 395
I am trying add extra libraries to scala used through spark-shell through the Elsatic MapReduce inatance. But I am unsure how to go by this, is there a build tool that is used when spark-shell runs?
All i need to do is install a scala library and have it run through the spark-shell version of scala, Im not sure how to go about this since Im not sure how the EMR instance installs scala and spark.
Upvotes: 2
Views: 4446
Reputation: 40380
I think that this answer will evolve with the information you give. As for now, considering that you have AWS EMR cluster deployed on which you wish to use the spark-shell. There is many options :
Option 1 : You can copy your libraries to the cluster with the scp
command and add them into your spark-shell with the --jars
options. e.g :
from your local machine :
scp -i awskey.pem /path/to/jar/lib.jar hadoop@emr-cluster-address:/path/to/destination
on your EMR cluster :
spark-shell --master yarn --jars lib.jar
Spark uses the following URL scheme to allow different strategies for disseminating jars:
Option 2 : You can have a copy of your libraries from S3 and add them with --jars option.
Option 3 : You can use the --packages
options to load it from remote repository. You can include any other dependencies by supplying a comma-delimited list of maven coordinates. All transitive dependencies will be handled when using this command. Additional repositories (or resolvers in SBT) can be added in a comma-delimited fashion with the flag --repositories. These commands can be used with pyspark, spark-shell, and spark-submit to include Spark Packages.
For Python, the equivalent --py-files option can be used to distribute .egg, .zip and .py libraries to executors.
Upvotes: 2