Reputation: 6139
What do you think of the answer for Question 4 mentioned in this site will be ?
Is the answer right or wrong
QUESTION: 4
In the standard word count MapReduce algorithm, why might using a combiner reduce theoverall Job running time?
A. Because combiners perform local aggregation of word counts, thereby allowing the mappers to process input data faster.
B. Because combinersperform local aggregation of word counts, thereby reducing the number of mappers that need to run.
C. Because combiners perform local aggregation of word counts, and then transfer that data toreducers without writing the intermediate data to disk.
D. Because combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that need to be snuff let across the network to the reducers.
Answer:A
and
QUESTION: 3
What happens in a MapReduce job when you set the number of reducers to one?
A. A single reducer gathers and processes all the output from all the mappers. The output iswritten in as many separate files as there are mappers.
B. A single reducer gathers andprocesses all the output from all the mappers. The output iswritten to a single file in HDFS.
C. Setting the number of reducers to one creates a processing bottleneck, and since the number of reducers as specified by the programmer is used as a reference value only, the MapReduceruntime provides a default setting for the number of reducers.
D. Setting the number of reducers to one is invalid, and an exception is thrown.
Answer:A
From my understanding answers to the above questions
Question 4: D
Question 3: B
UPDATE
You have user profile records in your OLTP database,that you want to join with weblogs you have already ingested into HDFS.How will you obtain these user records?
Options
A. HDFS commands
B. Pig load
C. Sqoop import
D. Hive
Answer:B
and for updated question my answer I am doubtfull with B and C
EDIT
Right Answer: Sqoop.
Upvotes: 2
Views: 1313
Reputation: 3154
As far my understanding both the answers are wrong.
I haven't worked much with the Combiner
but everywhere I found it to be working on outputs of Mapper
. The answer to Question No 4 should D.
Again from practical experience I've found that the number of output files is always equal to the number of Reducer
s. So the answer to the Question No 3 should be B. This may not be the case when using MultipleOutputs
but that's not common.
Finally I think Apache won't lie about MapReduce ( exceptions do occur :). The answer to the both the questions are available in their wiki page. have a look.
By the way, I liked the "100% Pass-Guaranteed or Your Money Back!!!" quote on the link you provided ;-)
EDIT
Not sure about the question in the update section since I've little knowledge on Pig & Sqoop. But certainly the same can be achieved using Hive by creating external tables on the HDFS data & then joining.
UPDATE
After comments from user milk3422 & the owner, I did some searching and find out that my assumption of Hive being the answer to the last question is wrong since another OLTP database is involved. The proper answer should be C as Sqoop is designed to transfer data between HDFS and relational databases.
Upvotes: 5
Reputation: 521
The answer for question 4 and 3 seem correct to me. For question 4 its quite justifiable becoz while using a combiner the map output is being kept in collection n processed first then buffer is flushed when full. To justify this I will add this link : http://wiki.apache.org/hadoop/HadoopMapReduce
Here it clearly states why combiner will add speed to the process.
Also I think q.3 answer is also correct becoz in general that's basic configuration followed by default. To justify that I will add another informative link: https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-7/mapreduce-types
Upvotes: 0