How to know if a machine in a Spark cluster 'participate's a job

Question

I wanted to know when it is safe to remove a node from a machine from a cluster.

My assumption is that it could be safe to remove a machine if the machine does not have any containers, and it does not store any useful data.

By the APIs at https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html, we can do

 GET http:///ws/v1/cluster/nodes

to get the information of each node like


    /default-rack
    RUNNING
    host1.domain.com:54158
    host1.domain.com
    host1.domain.com:8042
    1476995346399
    3.0.0-SNAPSHOT
    
    0
    0
    8192
    0
    8
    
        1027
        1027
        0.006664445623755455
        0
        0
        0.0

If numContainers is 0, I assume it does not run containers. However can it still store any data on disk that other downstream tasks can read?

I did not get if Spark lets us know this. I assume if a machine still stores some data useful for the running job, the machine may maintain a heart beat with Spark Driver or some central controller? Can we check this by scanning tcp or udp connections?

Is there any other way to check if a machine in a Spark cluster participates a job?

How to know if a machine in a Spark cluster 'participate's a job

Answers (1)

Related Questions

How to know if a machine in a Spark cluster &#39;participate&#39;s a job

Answers (1)

Related Questions

How to know if a machine in a Spark cluster 'participate's a job