Reputation: 5
I have an input file contains:
id value
1e 1
2e 1
...
2e 1
3e 1
4e 1
And I would like to find the total id of my input file. So In my main, I have declare a list so that when I read the input file, I will insert the line into the list
MainDriver.java public static Set list = new HashSet();
and I my map
// Apply regex to find the id
...
// Insert id to the list
MainDriver.list.add(regex.group(1)); // add 1e, 2e, 3e ...
and In my reduce, I try to use the list as
public void reduce(WritableComparable key, Iterator values,
OutputCollector output, Reporter reporter) throws IOException
{
...
output.collect(key, new IntWritable(MainDriver.list.size()));
}
So I expect the value print out the file, in this case will be 4. But it actually prints out 0.
I have verify that regex.group(1) would extract valid id. So I have no clue why the size of my list is 0 in the reduce process.
Upvotes: 0
Views: 68
Reputation: 4629
This is basically ignoring the advantage of using MapReduce in the first place.
Correct me if I'm wrong, but it appears you can map your output from your Mapper by "id", and then in your Reducer you receive something like Text key, Iterator values
as the parameters.
You can then just sum up values
and output output.collect(key, <total value>);
Example (apologies for using Context rather than OutputCollector, but the logic is the same):
public static class MyMapper extends Mapper<LongWritable, Text, Text, Text> {
private final Text key = new Text("id");
private final Text id = new Text();
public void map(LongWritable key, Text value,
Context context) throws IOException, InterruptedException {
id.set(regex.group(1)); // do whatever you do
context.write(id, countOne);
}
}
public static class MyReducer extends Reducer<Text, Text, Text, IntWritable> {
private final IntWritable totalCount = new IntWritable();
public void reduce(Text key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
int cnt = 0;
for (Text value : values) {
cnt ++;
}
totalCount.set(cnt);
context.write(key, totalCount);
}
}
Upvotes: 0
Reputation: 2725
The mappers and reducers run on separate JVMs (and often separate machines altogether) both from each other and from the driver program, so there is no common instance of your list
Set variable that all of those methods can concurrently read and write to.
One way in MapReduce to count the number of keys is:
(id, 1)
from your mapper1
s for each mapper using a combiner to minimize network and reducer I/Osetup()
initialize a class-scope numeric variable (int or long presumbly) to 0reduce()
increment the counter, and ignore the valuescleanup()
emit the counter value now that all keys have been processedUpvotes: 1