Reputation: 41
I am new in nutch and solr integration.
I want to crawl new urls so I installed both solr version 4.6.0 and nutch version 1.6 in ubuntu.First I start with some configuration but i still get this error:
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: File:/home/cloudera/apache-nutch-1.6/bin/20150529030452/crawl_fetch
Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin /20150529030452/crawl_parse
Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_data
Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_text
In the file logs I get this error:
2015-05-29 03:05:41,153 ERROR security.UserGroupInformation -PriviledgedActionException as:cloudera
cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/crawl_fetch
Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/crawl_parse
Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_data
Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_text
2015-05-29 03:05:41,153 ERROR solr.SolrIndexer - org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/crawl_fetch
Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/crawl_parse
Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_data
Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_text
Whats the meaning of this, can you please explain whats the issue and how can I solve it.
I will highly appreciate your help.
Upvotes: 0
Views: 1489
Reputation: 782
If you are using the bin/crawl
from Mac OS or any Unix-based operating system like FreeBSD, then switch to Ubuntu. I believe this is a bug the crawl script has. I faced this before and used Ubuntu instead.
Upvotes: 1