V. Samma
V. Samma

Reputation: 2608

How to free up resources on AWS EMR cluster?

I have a common problem where I start an AWS EMR Cluster and log in via SSH and then run spark-shell to test some Spark code and sometimes I lose my internet connection and Putty throws an error that the connection was lost.

But it seems the Spark related processes are still running. When I reconnect to the server and run spark-shell again, I get a lot of these errors:

17/02/07 11:15:50 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1486465722770_0002_01_000003 on host: ip-172-31-0-217.eu-west-1.compute.internal. Exit status: 1. Diagnostics: Exception from container-launch.

Googling this error suggested there are problems with the allocated memory, but as I am using small nodes on a test cluster, I don't even want to allocate more memory, I just want to release the resources used an restart the spark-shell, but I don't see any "Spark" processes running.

How can I fix this easily? Is there some other process I should try closing/restarting, like hadoop, mapred, yarn etc? I wouldn't want to start a new cluster every time I experience this.

Upvotes: 0

Views: 2297

Answers (1)

Tal Joffe
Tal Joffe

Reputation: 5828

you can use the Yarn api for that.. After SSH-ing to master, run this

yarn application -list

to see if there applications running. if there are you can use this command to kill them:

yarn application -kill <application id>

you can also use the resource manager web ui for doing the same thing. (available as a link on the top page of the cluster EMR page).

BTW you can use Zeppelin for running the same stuff you run on Spark-shell without worrying about disconnecting.. it is available on EMR (you need to select it as one of the applications when setting up a cluster).

it takes some time learning how to use and configure properly but might help you..

Upvotes: 2

Related Questions