Dolan Antenucci
Dolan Antenucci

Reputation: 15942

Hadoop word count example fails with 'not a SequentialFile'. How set file format?

I'm trying to run hadoop jar /usr/lib/hadoop/hadoop-examples.jar aggregatewordcount /data/gutenberg/huckfinn.txt output/guten4 but get an error "huckfinn.txt not a SequenceFile".

I read on other sites, and see in the source of this example file that there is an argument textinputformat that I'm guessing fixes this. I can't figure out what to specify for it though.

If I run hadoop jar /usr/lib/hadoop/hadoop-examples.jar aggregatewordcount /data/gutenberg/huckfinn.txt output/guten5 2 textinputformat, I get a different error, "java.lang.RuntimeException: Error in configuring object"

Upvotes: 1

Views: 3739

Answers (2)

Praveen Sripati
Praveen Sripati

Reputation: 33495

In the ValueAggregatorJob the following check is done

int numOfReducers = 1;
if (args.length > 2) {
  numOfReducers = Integer.parseInt(args[2]);
}

..............

if (args.length > 3 && 
    args[3].compareToIgnoreCase("textinputformat") == 0) {
  theInputFormat = TextInputFormat.class;
} else {
  theInputFormat = SequenceFileInputFormat.class;
}

If textinputformat (literal string) is not specified as an argument, then the input format is defaulted to SequenceFileInputFormat, so the huckfinn.txt not a SequenceFile error. Also, the reducers is defaulted to 1 if not specified.

Use the following command to run the job

hadoop jar hadoop-mapred-examples-0.21.0.jar aggregatewordcount /user/praveensripati/input/sample.txt /user/praveensripati/output 2 textinputformat

Note that usually the hadoop-mapred-examples-0.21.0.jar has a version number in it. This file is in the root of the Hadoop install. Make sure that the file /usr/lib/hadoop/hadoop-examples.jar present.

For resolving the java.lang.RuntimeException: Error in configuring object, please check the log files for a stack trace and post it back.

Upvotes: 1

Josh Rosen
Josh Rosen

Reputation: 13801

According to the mailing list post linked from your question, the java.lang.RuntimeException: Error in configuring object exception is caused by the example's dependencies not being on the tasktracker's classpath. You can see this from the full traceback: when I run your second command on my machine, I get:

java.lang.RuntimeException: Error in configuring object
    [...]
Caused by: java.lang.reflect.InvocationTargetException
    [...]
Caused by: java.lang.RuntimeException: Error in configuring object
    [...]
Caused by: java.lang.reflect.InvocationTargetException
    [...]
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.examples.AggregateWordCount$WordCountPlugInClass
    [...]
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.examples.AggregateWordCount$WordCountPlugInClass
    [...]

This post on the Cloudera blog discusses the different methods of providing dependencies to the tasktrackers.

To run the aggregatewordcount example, I used the -libjars option:

hadoop jar hadoop-examples.jar aggregatewordcount -libjars hadoop-examples.jar /data/gutenberg/huckfinn.txt output/guten7 2 textinputformat

Upvotes: 1

Related Questions