Reputation: 1441
I have 2 files of the form
File 1:
key1 value1
key2 value2
...
File 2:
key1 value3
key2 value4
...
I would like to produce a reduce output of the form
key1 (value1-value3)/value1
key2 (value2-value4)/value2
I have the map write the key and the value is prepended with a character telling it is coming from file1 or file2, but not sure how to write the reduce stage
My map method is
public void map(LongWritable key,Text val,Context context) throws IOException, InterruptedException
{
Text outputKey = new Text();
Text outputValue = new Text();
outputKey.set(key.toString());
if ("A")
{
outputValue.set("A,"+val);
}
else
{
outputValue.set("B," + val);
}
context.write(outputKey, outputValue);
}
}
Upvotes: 0
Views: 709
Reputation: 3154
It should be simple enough since you already tagged it, although a bit confusing to start. I assume that emitted values are like A23
(for file1) & B139
(for file2). Snippet :
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
int diff = 0;
int denominator = 1;
for (Text val : values) {
if (val.toString().startsWith("A")) {
denominator = Integer.parseInt(val.toString().substring(1));
diff += denominator;
} else if (val.toString().startsWith("B")) {
diff -= Integer.parseInt(val.toString().substring(1));
} else {
// This block shouldn't be reached unless malformed values are emitted
// Throw an exception or log it
}
}
diff /= denominator;
context.write(key, new IntWritable(diff));
}
Hope this will help. But I think your approach will fail badly when key1
and key2
are equal.
UPDATE
The map
should be like the following to work with the above reducer :
public void map(LongWritable key, Text val, Context context)
throws IOException, InterruptedException {
String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
String[] keyVal = val.toString().split("\\s+");
Text outputKey = new Text(keyVal[0]);
Text outputValue = new Text();
outputKey.set(key.toString());
if ("fileA".equals(fileName)) {
outputValue.set("A" + keyVal[1]);
} else {
outputValue.set("B" + keyVal[1]);
}
context.write(outputKey, outputValue);
}
Upvotes: 1
Reputation: 1
I have found using NamedVector very helpful in such circumstances. This provides an identification for the value so that you can perform required operations on the values based on the "name".
Upvotes: 0