Reputation:
I have written this simple code which is equal to the default run method in Reducer Class but something totally weird happens.
Here is the default run method:
public void More ...run(Context context) throws IOException, InterruptedException {
setup(context);
while (context.nextKey()) {
reduce(context.getCurrentKey(), context.getValues(), context);
}
cleanup(context);
}
output:
New reducer: 0
Reducer: 0:9,2:5
end of this reducer
Reducer: 0:9,5:7
end of this reducer
... (lots of keys)
Reducer: 7:7,6:7
end of this reducer
Reducer: 7:7,7:6
end of this reducer
and here is my overridden method:
@Override
public void run(Context context) throws IOException, InterruptedException {
setup(context);
HashMap<Text,HashSet<Text>> map = new HashMap<Text,HashSet<Text>>();
while (context.nextKey()) {
//reduce(context.getCurrentKey(),context.getValues(),context);
Text key = context.getCurrentKey();
map.put(key, new HashSet<Text>());
for(Text v : context.getValues()){
map.get(key).add(v);
}
}
for(Text k : map.keySet()){
reduce(k,map.get(k),context);
}
cleanup(context);
}
output:
New reducer: 0
Reducer: 7:7,7:6
end of this reducer
... (lots of keys)
Reducer: 7:7,7:6
end of this reducer
my problem is that if I copy the keys and values to the hashmap first nothing works properly and in the reduce call in the end it passes the same key (the first who stored in the hashmap) again and again :/ Can anyone help me? How can I do this work properly? I need to do this because I want to pre-process the keys before send them to the reducers. Thanks in advance!
Upvotes: 0
Views: 741
Reputation: 20969
Hadoop reuses the Writable objects. So you need to create new ones before putting them into your collection.
Changing your code to copy things would look like this:
while (context.nextKey()) {
Text key = new Text(context.getCurrentKey());
map.put(key, new HashSet<Text>());
for(Text v : context.getValues()){
map.get(key).add(new Text(v));
}
}
Upvotes: 1