Reputation: 1215
I'm having some trouble accomplishing what I thought would be a simple task. I'm trying to iterate over a file with two ints per line. The goal is to take the first integer and use that as a key value, and store the values of the second integer as a list, but only if the value is not present in the list. So, if the file looks like this:
3 11
4 7
5 10
5 6
6 5
6 10
3 11 #should be ignored
Then ideally, I'd have something like this at the end:
3 [11]
4 [7]
5 [10, 6]
6 [5, 10]
What would be the best way about going about this in terms of the data structure used to store the values? I know I could use ArrayWritable, but I don't think you can dynamically add values to it. I don't care about the order of the keys.
Upvotes: 0
Views: 108
Reputation: 7507
So your problem is very similar to the classic WordCount example. In your case though you don't want to emit the sum, but just the value a single time. As for the datastructure itself, it already is inside of a data structure, the Iterable, so there is no need to add them into a new datastructure. All you really want to do is just print them out in whatever form you need. Below I'll explain what I think you will need for the entire program.
For the mapper you want the identity mapper, you want to output the key value pair exactly how you read it in. This can be done using the Identity Mapper, or by simply not specifying one if you are using the new API, 0.23+.
For the Reducer, you should do something similar to the following. This simply appends the values for a given key with the comma you wanted, as I mentioned before you don't need to put them into a new datastructure as they are already in one. Once the reducer is done appending the values for a key, then it simply emits the key with the comma delimited values.
@Override
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
StringBuilder sb = new StringBuilder();
boolean first = true;
for (Text value: values){
if (!first) sb.append(", ");
else first = false;
sb.append(value);
}
context.write(key, new Text(sb.toString()));
}
Upvotes: 1