user2552010
user2552010

Reputation: 23

Hadoop write output into a normal file

I want to write the Reducer result into a normal file (e.g. .csv or .log file) instead of writing into HDFS. So I use the following code in reducer class:

@Override
public void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {

    // Standard algorithm for finding the max value
    long sum = 0;
    for (LongWritable value : values) {
        sum++; 
    }

    context.write(key, new LongWritable(sum));
    System.out.println(key + " : " + sum);
    Main.map.put(key.toString(), sum);
}

And I print the map's content into a csv file in the Main class. However, after reducer finishing, the file is empty. I found the map is empty because in the reducer class it doesn't put anything into the map, also I cannot see any System.out.println(key + " : " + sum) in the reducer in the console.

How could that be? They are not processed in the reducer class?

Upvotes: 0

Views: 937

Answers (1)

Mike Park
Mike Park

Reputation: 10931

Let's get down to the root of the issue here. Each map or reduce task is launched in its own Java Virtual Machine (JVM). These JVMs do not share memory with each other.

Lets say you have the following set up :

  • jvm-1 : JobClient (this is your main driver class)
  • jvm-2 : Reducer task (this is the JVM your reducer is running in)

This is what happens :

  1. jvm-1 initiates the map/reduce job
  2. jvm-2 puts an item in Main.map<K,V>
  3. map/reduce job finishes.
  4. jvm-1 tries to read from Main.map<K,V> but there's nothing there because jvm-2 wrote to a map in its own memory that jvm-1 won't see.

A similar thing happens System.out. It may not actually be attached to the stdout stream. It's likely (if you have a multi-node setup), the output is going to another machine on the network.

Upvotes: 1

Related Questions