Reputation: 33
I am new to Hadoop, My requirement is I need to process only first 10 rows from the each input file. and how to exit mapper after reading 10 rows of each file.
If anyone can provide some sample code , it would be great help.
thanks in advance.
Upvotes: 2
Views: 1758
Reputation: 1
suppose N = 10, then we can use the following code to read only 10 records from file below as:
line1
line2
.
.
.
line20
//mapper
class Mapcls extends Mapper<LongWritable, Text, Text, NullWritable>
{
public void run(Context con) throws IOException, InterruptedException
{
setup(con);
int rows = 0;
while(con.nextKeyValue())
{
if(rows++ == 10)
{
break;
}
map(con.getCurrentKey(), con.getCurrentValue(), con);
}
cleanup(con);
}
public void map(LongWritable key, Text value, Context con) throws IOException, InterruptedException
{
con.write(value, NullWritable.get());
}
}
//driver
public class Testjob extends Configured implements Tool
{
@Override
public int run(String[] args) throws Exception
{
Configuration conf = new Configuration();
Job job = new Job(conf, "Test-job");
job.setJobName("tst001");
job.setJarByClass(getClass());
job.setMapperClass(Mapcls.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(NullWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception
{
int rc = ToolRunner.run(new Configuration(), new Testjob(), args);
System.exit(rc);
}
}
Then the output will be :
line1
line10
line2
line3
line4
line5
line6
line7
line8
line9
Upvotes: 0
Reputation: 30089
You can override the run method of your mapper, and once you've iterated the map loop 10 times you can break from the while loop. This will assume your files are not splitable, otherwise you'll get the first 10 lines from each split:
@Override
public void run(Context context) throws IOException, InterruptedException {
setup(context);
int rows = 0;
while (context.nextKeyValue()) {
if (rows++ == 10) {
break;
}
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
cleanup(context);
}
Upvotes: 3