Reputation: 1
I have followed the tutorial and configured nutch to run on Windows 7 using Cygwin and i'm using Solr 5.4.0 to index the data
But nutch 1.11 is having problem in executing a crawl.
Crawl Command $ bin/crawl -i -D solr.server.url=http://127.0.0.1:8983/solr /urls /TestCrawl 2
Error/Exception
Injecting seed URLs /apache-nutch-1.11/bin/nutch inject /TestCrawl/crawldb /urls Injector: starting at 2016-01-19 17:11:06 Injector: crawlDb: /TestCrawl/crawldb Injector: urlDir: /urls Injector: Converting injected urls to crawl db entries. Injector: java.lang.NullPointerException at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012) at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.util.Shell.execCommand(Shell.java:739) at org.apache.hadoop.util.Shell.execCommand(Shell.java:722) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:421) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:281) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:125) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:348) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833) at org.apache.nutch.crawl.Injector.inject(Injector.java:323) at org.apache.nutch.crawl.Injector.run(Injector.java:379) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.crawl.Injector.main(Injector.java:369)
Error running:
/home/apache-nutch-1.11/bin/nutch inject /TestCrawl/crawldb /urls
Failed with exit value 127.
Upvotes: 0
Views: 1680
Reputation: 1818
hadoop-core
jar file is needed when you are working with nutch
with nutch 1.11 compatible hadoop-core jar is 0.20.0
please download jar from this link : http://www.java2s.com/Code/Jar/h/Downloadhadoop0200corejar.htm
paste that jar into
"C:\cygwin64\home\apache-nutch-1.11\lib"
folder and it will run successfully.
Upvotes: 0
Reputation: 782
I can see there are multiple problems with your command, try this:
bin/crawl -i -Dsolr.server.url=http://127.0.0.1:8983/solr/core_name path_to_seed crawl 2
The first problem is that there is a space when you pass the solr parameter. The second problem is that the solr url should include the core name as well.
Upvotes: 1