Reputation: 15942
I'm trying to run hadoop jar /usr/lib/hadoop/hadoop-examples.jar aggregatewordcount /data/gutenberg/huckfinn.txt output/guten4
but get an error "huckfinn.txt not a SequenceFile".
I read on other sites, and see in the source of this example file that there is an argument textinputformat
that I'm guessing fixes this. I can't figure out what to specify for it though.
If I run hadoop jar /usr/lib/hadoop/hadoop-examples.jar aggregatewordcount /data/gutenberg/huckfinn.txt output/guten5 2 textinputformat
, I get a different error, "java.lang.RuntimeException: Error in configuring object"
Upvotes: 1
Views: 3739
Reputation: 33495
In the ValueAggregatorJob the following check is done
int numOfReducers = 1;
if (args.length > 2) {
numOfReducers = Integer.parseInt(args[2]);
}
..............
if (args.length > 3 &&
args[3].compareToIgnoreCase("textinputformat") == 0) {
theInputFormat = TextInputFormat.class;
} else {
theInputFormat = SequenceFileInputFormat.class;
}
If textinputformat
(literal string) is not specified as an argument, then the input format is defaulted to SequenceFileInputFormat, so the huckfinn.txt not a SequenceFile error
. Also, the reducers is defaulted to 1 if not specified.
Use the following command to run the job
hadoop jar hadoop-mapred-examples-0.21.0.jar aggregatewordcount /user/praveensripati/input/sample.txt /user/praveensripati/output 2 textinputformat
Note that usually the hadoop-mapred-examples-0.21.0.jar has a version number in it. This file is in the root of the Hadoop install. Make sure that the file /usr/lib/hadoop/hadoop-examples.jar
present.
For resolving the java.lang.RuntimeException: Error in configuring object
, please check the log files for a stack trace and post it back.
Upvotes: 1
Reputation: 13801
According to the mailing list post linked from your question, the java.lang.RuntimeException: Error in configuring object
exception is caused by the example's dependencies not being on the tasktracker's classpath. You can see this from the full traceback: when I run your second command on my machine, I get:
java.lang.RuntimeException: Error in configuring object
[...]
Caused by: java.lang.reflect.InvocationTargetException
[...]
Caused by: java.lang.RuntimeException: Error in configuring object
[...]
Caused by: java.lang.reflect.InvocationTargetException
[...]
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.examples.AggregateWordCount$WordCountPlugInClass
[...]
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.examples.AggregateWordCount$WordCountPlugInClass
[...]
This post on the Cloudera blog discusses the different methods of providing dependencies to the tasktrackers.
To run the aggregatewordcount example, I used the -libjars
option:
hadoop jar hadoop-examples.jar aggregatewordcount -libjars hadoop-examples.jar /data/gutenberg/huckfinn.txt output/guten7 2 textinputformat
Upvotes: 1