NAJMI
NAJMI

Reputation: 41

Error : org.apache.hadoop.mapred.InvalidInputException: Input path does not exist

I am new in nutch and solr integration.

I want to crawl new urls so I installed both solr version 4.6.0 and nutch version 1.6 in ubuntu.First I start with some configuration but i still get this error:

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: File:/home/cloudera/apache-nutch-1.6/bin/20150529030452/crawl_fetch

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin /20150529030452/crawl_parse

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_data

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_text

In the file logs I get this error:

2015-05-29 03:05:41,153 ERROR security.UserGroupInformation -PriviledgedActionException as:cloudera

cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/crawl_fetch

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/crawl_parse

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_data

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_text

2015-05-29 03:05:41,153 ERROR solr.SolrIndexer - org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/crawl_fetch

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/crawl_parse

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_data

Input path does not exist: file:/home/cloudera/apache-nutch-1.6/bin/20150529030452/parse_text

Whats the meaning of this, can you please explain whats the issue and how can I solve it.

I will highly appreciate your help.

Upvotes: 0

Views: 1489

Answers (1)

aalbahem
aalbahem

Reputation: 782

If you are using the bin/crawl from Mac OS or any Unix-based operating system like FreeBSD, then switch to Ubuntu. I believe this is a bug the crawl script has. I faced this before and used Ubuntu instead.

Upvotes: 1

Related Questions