iCode
iCode

Reputation: 4338

Data locality if HDFS not used

What happens to data locality feature of Map/Reduce portion of Hadoop when you provide it with a different storage other than HDFS like a MySql server and so on? In other words, my understanding is that Hadoop Map/Reduce uses data locality to try to launch a map task on the same node that the data is but when the data is stored in sql sever, there is no local data on the task node as all data are in the sql server node. So do we lose the data locality in that case or the definition of the data locality is changing? If it changes, what is the new defintion?

Upvotes: 2

Views: 623

Answers (1)

Steve Severance
Steve Severance

Reputation: 6646

There is no data locality if the data is not in the cluster. All the data must be copied from the remote source. This is the same as if the task cannot be run on a node that contains the data in HDFS. There are several input formats that use remote sources including S3, HBase and DB. If you can put your data in HDFS that is great. I use Mongo as a remote source quite regularly for small amounts of data that is frequently updated and I have been happy with the results.

Upvotes: 4

Related Questions