Combiners , Reducers and EcoSystemProject in Hadoop

Question

What do you think of the answer for Question 4 mentioned in this site will be ?

Is the answer right or wrong

QUESTION: 4

In the standard word count MapReduce algorithm, why might using a combiner reduce theoverall Job running time?

A. Because combiners perform local aggregation of word counts, thereby allowing the mappers to process input data faster.
B. Because combinersperform local aggregation of word counts, thereby reducing the number of mappers that need to run.
C. Because combiners perform local aggregation of word counts, and then transfer that data toreducers without writing the intermediate data to disk.
D. Because combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that need to be snuff let across the network to the reducers.

Answer:A

and

QUESTION: 3

What happens in a MapReduce job when you set the number of reducers to one?

A. A single reducer gathers and processes all the output from all the mappers. The output iswritten in as many separate files as there are mappers.
B. A single reducer gathers andprocesses all the output from all the mappers. The output iswritten to a single file in HDFS.
C. Setting the number of reducers to one creates a processing bottleneck, and since the number of reducers as specified by the programmer is used as a reference value only, the MapReduceruntime provides a default setting for the number of reducers.
D. Setting the number of reducers to one is invalid, and an exception is thrown.
Answer:A

From my understanding answers to the above questions

Question 4: D
Question 3: B

UPDATE

You have user profile records in your OLTP database,that you want to join with weblogs you have already ingested into HDFS.How will you obtain these user records?
Options
A. HDFS commands
B. Pig load
C. Sqoop import
D. Hive
Answer:B

and for updated question my answer I am doubtfull with B and C

EDIT

Right Answer: Sqoop.

blackSmith · Accepted Answer

As far my understanding both the answers are wrong.

I haven't worked much with the Combiner but everywhere I found it to be working on outputs of Mapper. The answer to Question No 4 should D.

Again from practical experience I've found that the number of output files is always equal to the number of Reducers. So the answer to the Question No 3 should be B. This may not be the case when using MultipleOutputs but that's not common.

Finally I think Apache won't lie about MapReduce ( exceptions do occur :). The answer to the both the questions are available in their wiki page. have a look.

By the way, I liked the "100% Pass-Guaranteed or Your Money Back!!!" quote on the link you provided ;-)

EDIT
Not sure about the question in the update section since I've little knowledge on Pig & Sqoop. But certainly the same can be achieved using Hive by creating external tables on the HDFS data & then joining.

UPDATE
After comments from user milk3422 & the owner, I did some searching and find out that my assumption of Hive being the answer to the last question is wrong since another OLTP database is involved. The proper answer should be C as Sqoop is designed to transfer data between HDFS and relational databases.

Combiners , Reducers and EcoSystemProject in Hadoop

Answers (2)

Related Questions