Amit Kumar
Amit Kumar

Reputation: 2745

How can I retrieve workers information for a running application in SPARK?

I want to get information about the workers that are being used by an application in Spark cluster. I need to get its IP address, CPU cores, memory available etc. Is there any API in spark for this purpose? Snapshot Above image shows the same info on Spark UI but I am not able to figure out the way to get it by JAVA code.

It is specific to JAVA. I want all worker nodes information. Thanks.

Upvotes: 1

Views: 2599

Answers (1)

Radu
Radu

Reputation: 1128

There are multiple ways to do this:

  • Parse the output log messages and see what workers are started on each machine in your cluster. You can get the names/IPs of all the hosts, when tasks are started and where, how much memory each worker gets, etc. If you want to see the exact HW configuration, you will then need to log in to the worker nodes or use different tools.

  • The same information as in the web frontend is contained in the eventLogs of the spark applications (this is actually where the data you see comes from). I prefer to use the eventLog as it is very easy to parse in python rather than the log messages.

  • If you want to have real-time monitoring of the cluster you can use either ganglia (gives nice graphical displays of CPU/memory/network/disks) or use colmux that gives you the same data but in a text format. I personally prefer colmux (easier to set up, you get immediate stats, etc).

Upvotes: 2

Related Questions