JRhino
JRhino

Reputation: 99

Does Spark Utilize HDFS Centralized Cache?

Just wondering if Spark utilizes HDFS Centralized Caching, I can't seem to find anywhere that this is asked.

e.g.

 hiveContext.sql("SELECT * FROM A_TABLE")

Would this utilize the cached blocks?

Upvotes: 2

Views: 674

Answers (1)

Gagan Taneja
Gagan Taneja

Reputation: 26

It does use HDFS cached blocks but currently not optimized for it. For example the block might be cached on nodeA but task is scheduled on nodeB. If the block is local to nodeB then it will be read from disk. If the block in not local then HDFS will make sure to read it from nodeA where its cached I have a jira task open to optimize it although its not merged yet to spark trunk https://issues.apache.org/jira/browse/SPARK-19705

Upvotes: 1

Related Questions