cs0815
cs0815

Reputation: 17388

using spark with aws cluster

I setup a cluster successfully following the instruction here. Just wondering could I invoke Spark via the API with this type of cluster? Where can I find the Spark endpoint(s) detail(s) please? If the aforementioned tutorial is a dead-end, could anyone point me in the right direction please?

My ultimate POC aim is to add 2 columns in a flat file (e.g. csv) in some S3 bucket and compare the resulting values with a third column via spark (this is not a homework (-:) - ideally using Mobius as I am [former] .net dev).

Upvotes: 0

Views: 68

Answers (1)

Vidya
Vidya

Reputation: 30300

This reference should provide you the information you need. Here is a snippet:

"Go into the ec2 directory in the release of Apache Spark you downloaded. Run ./spark-ec2 -k <keypair> -i <key-file> -s <num-slaves> launch <cluster-name>, where <keypair> is the name of your EC2 key pair (that you gave it when you created it), <key-file> is the private key file for your key pair, <num-slaves> is the number of slave nodes to launch (try 1 at first), and <cluster-name> is the name to give to your cluster.

For example:

export AWS_SECRET_ACCESS_KEY=AaBbCcDdEeFGgHhIiJjKkLlMmNnOoPpQqRrSsTtU
export AWS_ACCESS_KEY_ID=ABCDEFG1234567890123 

./spark-ec2 --key-pair=awskey --identity-file=awskey.pem --region=us-west-1 --zone=us-west-1a launch my-spark-cluster 

After everything launches, check that the cluster scheduler is up and sees all the slaves by going to its web UI, which will be printed at the end of the script (typically http://master-hostname:8080)."

Upvotes: 1

Related Questions