Ged
Ged

Reputation: 18003

"total-executor-cores" parameter in Spark in relation to Data Nodes

Another item that I read little about.

Leaving S3 aside, and not in the position just now to try out on a bare metal classic data locality approach to Spark, Hadoop, and not in Dynamic Resource Allocation mode, then:

Thanks in advance.

Upvotes: 0

Views: 328

Answers (1)

user10964773
user10964773

Reputation: 26

There seems to be little bit of confusion here.

Optimal Data locality (node local) is something we want to achieve, not guarantee. All Spark can do is request resources (for example with YARN - How YARN knows data locality in Apache spark in cluster mode) and hope that it will get resources, which satisfy data locality constraints.

If it doesn't it will simply fetch data from remote nodes. However it is not shuffle. It just a simple transfer over network.

So to answer your question - Spark will use resource which has been allocated, trying to do its best do satisfy the constraints. It cannot use nodes, which hasn't been acquired, so it won't automatically get additional nodes for reads.

Upvotes: 1

Related Questions