Reputation: 1610
Here is the background. I have the following input for my MapReduce job (example):
Apache Hadoop
Apache Lucene
StackOverflow
....
(Actually each line represents a user query. Not important here.) And I want my RecordReader
class read one line and then pass several key-value pairs to mappers. For example, if RecordReader
gets Apache Hadoop
, then I want it to generate the following key-value pairs and pass it to mappers:
Apache Hadoop - 1
Apache Hadoop - 2
Apache Hadoop - 3
("-" is the separator here.) And I found RecordReader
pass key-values in next()
method:
next(key, value);
Every time a RecordReader.next() is called, only one key and one value will be passed as argument. So how should I get my work done?
Upvotes: 2
Views: 1675
Reputation: 1212
I think if you want to send to the mapper use the same key; you must implement your owner RecordReader; for example you can wirte a MutliRecordReader to extends the LineRecordReade; and here you must change the nextKeyValue method; this is the original Code from LineRecordReade:
public boolean nextKeyValue() throws IOException {
if (key == null) {
key = new LongWritable();
}
key.set(pos);
if (value == null) {
value = new Text();
}
int newSize = 0;
// We always read one extra line, which lies outside the upper
// split limit i.e. (end - 1)
while (getFilePosition() <= end) {
newSize = in.readLine(value, maxLineLength,
Math.max(maxBytesToConsume(pos), maxLineLength));
pos += newSize;
if (newSize < maxLineLength) {
break;
}
// line too long. try again
LOG.info("Skipped line of size " + newSize + " at pos " +
(pos - newSize));
}
if (newSize == 0) {
key = null;
value = null;
return false;
} else {
return true;
}
}
and you can change it like this:
public boolean nextKeyValue() throws IOException {
if (key == null) {
key = new Text();
}
key.set(pos);
if (value == null) {
value = new Text();
}
int newSize = 0;
while (getFilePosition() <= end && n<=3) {
newSize = in.readLine(key, maxLineLength,
Math.max(maxBytesToConsume(pos), maxLineLength));//change value --> key
value =Text(n);
n++;
if(n ==3 )// we don't go to next until the N is three;
pos += newSize;
if (newSize < maxLineLength) {
break;
}
// line too long. try again
LOG.info("Skipped line of size " + newSize + " at pos " +
(pos - newSize));
}
if (newSize == 0) {
key = null;
value = null;
return false;
} else {
return true;
}
}
I think this can suit for you
Upvotes: 1
Reputation: 418
Try not giving key:-
context.write(NullWritable.get(), new Text("Apache Hadoop - 1"));
context.write(NullWritable.get(), new Text("Apache Hadoop - 2"));
context.write(NullWritable.get(), new Text("Apache Hadoop - 3"));
Upvotes: 0
Reputation: 3942
I believe you can simply use this:
public static class MultiMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
for (int i = 1; i <= n; i++) {
context.write(value, new IntWritable(i));
}
}
}
Here n is the number of values you want to pass. For example for the key-value pairs you specified:
Apache Hadoop - 1
Apache Hadoop - 2
Apache Hadoop - 3
n would be 3.
Upvotes: 2