Brian
Brian

Reputation: 7326

Hadoop - word count per node

I am implementing a customized version of WordCount.java in Hadoop where I am interested in outputting the word counts per node.

For example, given text:

FindMe FindMe ..... .... .... .. more big text ... FindMe FindMe FindMe

FindMe node01: 2
FindMe node02: 3

Here is a snippet from my Mapper

String searchString = "FindMe";
while (itr.hasMoreTokens()) {
  String token = itr.nextToken();
  if (token.equals(searchString)) {
    word.set(token);
    context.write(word, one);
  }
}

This code outputs

FindMe n

where n is the total number of occurrences in all the input.

How can I output the count for each node along with some kind of identifier for this node like the example I provided above?

Upvotes: 1

Views: 130

Answers (1)

Karthik
Karthik

Reputation: 1811

You can output string + hostname at mapper so that you can have word count for each node.

   java.net.InetAddress localMachine = java.net.InetAddress.getLocalHost();
    String computerName = localMachine.getHostName();    
        String searchString = "FindMe";
        while (itr.hasMoreTokens()) {
          String token = itr.nextToken();
          if (token.equals(searchString)) {
            word.set(token+" "+computerName);
            context.write(word, one);
          }
        }

Upvotes: 2

Related Questions