user9252846
user9252846

Reputation:

How does map-reduce work..Did i get it right?

I'm trying to understand how map-reduce actually work. please read what i written below and tell me if there's any missing parts or incorrect things in here. Thank you.

The data is first splitted into what is called input splits(which is a logical kind of group which we define the size of it as our needs of record processing). Then, there is a Mapper for every input split which takes every input split and sort it by key and value. Then, there is the shuffling process which takes all of the data from the mappers (key-values) and merges all the same keys with its values(output it's all the keys with its list of values). The shuffling process occurs in order to give the reducer an input of a 1 key for each type of key with its summed values. Then, the Reducer merges all the key value into one place(page maybe?) which is the final result of the MapReduce process. We only have to make sure to define the Map(which gives output of key-value always) and Reduce(final result- get the input key-value and can be count,sum,avg,etc..) step code.

Upvotes: 0

Views: 1690

Answers (2)

Pradeep Bhadani
Pradeep Bhadani

Reputation: 4721

Similar QA - Simple explanation of MapReduce?

Also, this post explain Hadoop - HDFS & Mapreduce in very simple way https://content.pivotal.io/blog/demystifying-apache-hadoop-in-5-pictures

Upvotes: 0

Gyanendra Dwivedi
Gyanendra Dwivedi

Reputation: 5538

Your understanding is slightly wrong specially how mapper works. I got a very nice pictorial image to explain in simple term

enter image description here

It is similar to the wordcount program, where

  • Each bundle of chocolates are the InputSplit, which is handled by a mapper. So we have 3 bundles.
  • Each chocolate is a word. One or more words (making a sentence) is a record input to single mapper. So, within one inputsplit, there may be multiple records and each record is input to single mapper.
  • mapper count occurrence of each of the word (chocolate) and spit the count. Note that each of the mapper is working on only one line (record). As soon as it is done, it picks next record from the inputsplit. (2nd phase in the image)

  • Once map phase is finished, sorting and shuffling takes place to make a bucket of same chocolates counts. (3rd phase in the image)

  • One reducer get one bucket with key as name of the chocolate (or the word) and a list of counts. So, there are as many reducer as many distinct words in whole input file.
  • The reducer iterates through the count and sum them up to produce the final count and emit it against the word.

The Below diagram shows how one single inputsplit of wordcount program works:

enter image description here

Upvotes: 1

Related Questions