Reputation: 91
I am new in Hadoop Map-reduce. My input is many text files and I want to write the map-reduce program such that it will write all the files-names and the associated sentences with the file names in one output file where I want to just emit the file-name(key) and the associated sentences(value) from the mapper and the reducer will collect the key and all the values and write the file-name and their associated sentences in the output.
Mapper and reducer:
public void map(Text key, Text value,
OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {
StringTokenizer itr = new StringTokenizer(value.toString(), ",");
String filename = new String();
FileSplit filesplit = (FileSplit) reporter.getInputSplit();
filename = filesplit.getpath().getName();
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
output.collect(new Text(filename), word);
}
}
public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {
// int sum = 0;
String translation = "";
while (values.hasNext()) {
translation += "|" + values.toString() + "|";
}
results.set(translation);
output.collect(key, results);
}
When I run the above mapper and reducer with the same configuration of inputformat (keyvaluetextinputformat.class) it does not write any thing in the output.
What should I change to achieve my goal?
Upvotes: 0
Views: 274
Reputation: 16390
In your reduce method you declare values to be an Iterator. It should be declared as an Iterable instead.
public void reduce(Text key, Iterable<Text> values, ....
instead of
public void reduce(Text key, Iterator<Text> values, ....
Once you've done that, you can do:
Iterator<Text> iter = values.iterator();
while(iter.hasNext())
{
translation += "|" + iter.next().toString() + "|";
}
Because you used the wrong type the method isn't overriding the default reduce method which doesn't do anything. That's why you get no output.
I also don't see where you declare the variable results, either.
Upvotes: 2