Retrieve the number of reduce input groups in Hadoop

Question

I have a homework assignment in which I must retrieve the total number of distinct words in a certain document.

It's very similar to the WordCount example provided by Hadoop. But now I just want the total number of distinct words in the document. In the console output the number of reduce input groups corresponds to the total number of distinct words.

Is there a simple way to retrieve this number without even reducing the data. Or is Map/Reduce not the way to go for this problem. Chaining could also be a solution, but because the answer is already provided in the console output of the job I'm wondering if there isn't a simple way to retrieve the number of reduce input groups without doing stuff that isn't needed.

Greetings, Hadoop newcomer

Retrieve the number of reduce input groups in Hadoop

Answers (1)

Related Questions