Reputation: 1391
I'm starting out with hadoop 0.20.2. I wanted to start out with the basic wordcount problem with the code I found here: http://cxwangyi.blogspot.com/2009/12/wordcount-tutorial-for-hadoop-0201.html
This works like it should. However, when the words are seperated over multiple files and I want to count words per file, so I change the mapper to:
String fileName = ((org.apache.hadoop.mapreduce.lib.input.FileSplit) context.getInputSplit()).getPath().getName();
word.set(itr.nextToken()+"@"+fileName);
But then I get duplicates in my mapreduced file, like this: word1@file1 1 word2@file2 1 word2@file2~ 1 ...
So word2@file2~ 1 should not have been there...
Anybody knows what I'm doing wrong?
Thanks
Upvotes: 0
Views: 444
Reputation: 628
Are you sure you don't have a file with the tilde at the end added to the input for the hadoop job? Some editors like Gedit generate them every time the file gets edited.
Upvotes: 2