Reputation: 2608
I have a common problem where I start an AWS EMR Cluster and log in via SSH and then run spark-shell
to test some Spark code and sometimes I lose my internet connection and Putty throws an error that the connection was lost.
But it seems the Spark related processes are still running. When I reconnect to the server and run spark-shell
again, I get a lot of these errors:
17/02/07 11:15:50 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1486465722770_0002_01_000003 on host: ip-172-31-0-217.eu-west-1.compute.internal. Exit status: 1. Diagnostics: Exception from container-launch.
Googling this error suggested there are problems with the allocated memory, but as I am using small nodes on a test cluster, I don't even want to allocate more memory, I just want to release the resources used an restart the spark-shell
, but I don't see any "Spark" processes running.
How can I fix this easily? Is there some other process I should try closing/restarting, like hadoop, mapred, yarn etc? I wouldn't want to start a new cluster every time I experience this.
Upvotes: 0
Views: 2297
Reputation: 5828
you can use the Yarn api for that.. After SSH-ing to master, run this
yarn application -list
to see if there applications running. if there are you can use this command to kill them:
yarn application -kill <application id>
you can also use the resource manager web ui for doing the same thing. (available as a link on the top page of the cluster EMR page).
BTW you can use Zeppelin for running the same stuff you run on Spark-shell without worrying about disconnecting.. it is available on EMR (you need to select it as one of the applications when setting up a cluster).
it takes some time learning how to use and configure properly but might help you..
Upvotes: 2