how to custom select column reading for hadoop input in java for map reducer job

Question

New to Hadoop and I'm trying to understand how Hadoop read file input : I am able to use this code below to run Hadoop job from 2 column ( key / value ) input file :
enter image description here

But what if I have 5 columns and the ( key /value ) i want is A&E ( instead of A&B) which function do I need to modify exactly ?

enter image description here

public class InverterCounter extends Configured implements Tool {

    public static class MapClass extends MapReduceBase
        implements Mapper {

        public void map(Text key, Text value,
                        OutputCollector output,
                        Reporter reporter) throws IOException {

            output.collect(value, key);
        }
    }   
    public static class Reduce extends MapReduceBase
        implements Reducer {

        public void reduce(Text key, Iterator values,
                           OutputCollector output,
                           Reporter reporter) throws IOException {

            int count = 0;
            while (values.hasNext()) {
                values.next();
                count++;
            }
            output.collect(key, new IntWritable(count));
        }
    }   
    public int run(String[] args) throws Exception {
        Configuration conf = getConf();      
        JobConf job = new JobConf(conf, InverterCounter.class);    
        Path in = new Path(args[0]);
        Path out = new Path(args[1]);
        FileInputFormat.setInputPaths(job, in);
        FileOutputFormat.setOutputPath(job, out);

        job.setJobName("InverterCounter");
        job.setMapperClass(MapClass.class);
        job.setReducerClass(Reduce.class);   
        job.setInputFormat(KeyValueTextInputFormat.class);
        job.setOutputFormat(TextOutputFormat.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        job.set("key.value.separator.in.input.line", ",");       
        JobClient.runJob(job);      
        return 0;
    }  
    public static void main(String[] args) throws Exception { 
        int res = ToolRunner.run(new Configuration(), new InverterCounter(), args);       
        System.exit(res);
    }
}

Any recommendation would appreciated, I was trying to change job.set("key.value.separator.in.input.line", ","); and job.setInputFormat(KeyValueTextInputFormat.class); with no luck still could not figure this out.

Thanks

how to custom select column reading for hadoop input in java for map reducer job

Answers (1)

Related Questions