Error while crawling in Apache Nutch

Question

I have installed Apache Nutch 2.3.1 on top of Hadoop(2.5.2) multi node clusters (AWS EC2 machines). I have configured Nutch files accordingly(On master node). I have moved seed.txt file(which has urls to be crawled) from master to Hdfs file system. Now, I run the following command to crawl,

bin/hadoop jar /home/ubuntu/nutch/runtime/deploy/apache-nutch-2.3.1.job org.apache.nutch.crawl.Crawl urls -dir crawl -depth 1 -topN 5

I'm getting error,

Exception in thread "main" java.lang.ClassNotFoundException: org.apache.nutch.crawl.Crawl
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:205)

I have installed java - 1.8.0_151. I found that Crawl Class in not found in this java version. So, should we replace java1.8 with java1.7 version or issue is with other thing.

Help me out of this issue.

Error while crawling in Apache Nutch

Answers (1)

Related Questions