Roger Garzon Nieto
Roger Garzon Nieto

Reputation: 6594

Hadoop 1.03 and Nutch 1.5 issue

I get the following error when I try to run nutch-1.5 on hadoop 1.03.

hadoop jar nutch-1.5.job org.apache.nutch.crawl.Crawl urls -dir urls -depth 1 -topN 5

**Caused by: java.io.IOException: can't find class: org.apache.nutch.protocol.ProtocolStatus because org.apache.nutch.protocol.ProtocolStatus**

I see the bug report https://issues.apache.org/jira/browse/NUTCH-1084 on nutch-1.3 but it seems that is not yet resolved. Any help is appreciated.

I follow this tutorials:

http://wiki.apache.org/nutch/NutchHadoopTutorial

http://wiki.apache.org/nutch/NutchTutorial

http://wiki.apache.org/hadoop/HowToConfigure

EDIT

I follow this tutorial http://www.rui-yang.com/develop/build-nutch-1-4-cluster-with-hadoop/ and it works for me. I don't know what exactly fix the problem. I run hadoop in a single node. I make this changes:

1.copy the hadoop-env.sh, core-site.xml, hdfs-site.xml, mapred-site.xml, master, slaves from hadoop/conf to nutch/conf and rebuild nutch

2.export CLASSPATH=:$NUTCH_HOME/runtime/local/lib

I create the following tutorial http://dataspider.blogspot.com.es/2012/09/instalacion-de-hadoop.html

Upvotes: 0

Views: 469

Answers (1)

Badal Singh
Badal Singh

Reputation: 918

If you want to use hadoop 1.0.3 then use nutch1.5.1 instead of 1.5

Check out the release note of nutch1.5.1 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=10680&version=12321850

It doesn't say if NUTCH-1084 got fixed in this version but following patch was included in this release https://issues.apache.org/jira/browse/NUTCH-1398

Upvotes: 1

Related Questions