Reputation: 4634
When I was trying to understand hadoop
architecture, I want to figure out some problems.
When there is a big data input, HDFS will divide it into many chucks(64MB or 128MB per chuck) and then replicate many time to store them in memory block, right?
However, I still don't know where does the MapReduce
work. Is it using to divide and merge data to store it? or using it to return some useful output?
Upvotes: 1
Views: 1169
Reputation: 3798
Storing data in HDFS is a very different thing than analyzing it with the MapReduce paradigm.
When uploaded to HDFS, big data files are splited into blocks which are stored in the datanodes, and each block is replicated as many times as the configured replication factor (by default, 3). Data spliting is as simple as dividing the file by the configured block size.
MapReduce, as said, is a programming paradigm when analyzing big data files in order to obtain value added information. In a few words, each file block is assigned to a map task in order all the mappers perform the same operation on the chuncks; once finished, the output partial results are sent to the reducers in order to aggregate the data in some way.
Upvotes: 1