How does hadoop store data and use MapReduce?

Question

When I was trying to understand hadoop architecture, I want to figure out some problems. When there is a big data input, HDFS will divide it into many chucks(64MB or 128MB per chuck) and then replicate many time to store them in memory block, right?

However, I still don't know where does the MapReduce work. Is it using to divide and merge data to store it? or using it to return some useful output?

frb · Accepted Answer

Storing data in HDFS is a very different thing than analyzing it with the MapReduce paradigm.

When uploaded to HDFS, big data files are splited into blocks which are stored in the datanodes, and each block is replicated as many times as the configured replication factor (by default, 3). Data spliting is as simple as dividing the file by the configured block size.

MapReduce, as said, is a programming paradigm when analyzing big data files in order to obtain value added information. In a few words, each file block is assigned to a map task in order all the mappers perform the same operation on the chuncks; once finished, the output partial results are sent to the reducers in order to aggregate the data in some way.

How does hadoop store data and use MapReduce?

Answers (1)

Related Questions