t1w
t1w

Reputation: 1428

Nutch 1.7 JAVA_HOME not set Error

I'm experimenting Apache Nutch 1.7 and Solr on Ubuntu 14.04 x64 (AMD) LTS and when i try to run Nutch, it gives me this error message:

Error: JAVA_HOME is not set.

But when i type echo $JAVA_HOME command on terminal, it gives me this path: /usr/lib/jvm/java-7-openjdk-amd64

Below you can see what i've done step by step. How can i fix this?

*ps: Ubuntu is a virtual machine which runs on Mac with Oracle VirtualBox

  1. Intalling java on terminal with sudo apt-get -y install openjdk-7-jdk
  2. Checking java installation by java -version command
  3. Setting JAVA_HOME with:

  4. sudo nano /etc/environment

  5. Then typing following line at the bottom of file: JAVA_HOME="/usr/lib/jvm/java-7-openjdk-amd64"

  6. kntrl+X shortcut for Saving changes.

  7. Then this command: source /etc/environment

  8. Now JAVA_HOME must be set. I checked it by following command and it gives me the path. echo $JAVA_HOME and output is same as above.

  9. Then i installed Solr by sudo apt-get -y install solr-tomcat

  10. I controlled installation by typing this address in a browser: http://localhost:8080/solr and it shows me initial page of solr

  11. I downloaded Apache Nutch 1.7 from http://nutch.apache.org and file was named as apache-nutch-1.7.-bin.tar.gz

  12. Then extract it: tar -zxvf apache-nutch-1.7-bin.tar.gz

  13. I verfied Nutch's installation by simply this: cd apache-nutch-1.7 then bin/nutch And the output is like Usage: nutch COMMAND where......

  14. Then i edit my conf/nutch-site.xml file as in here: Link (You need to look under this title: "3) Set Up Your Nutch-Site.Xml" ) Things i did different from that last reference are; MyBot and MyBot,* fields. Instead of MyBot i wrote mySpider

  15. Then i get in conf directory of nutch with Terminal. Here's what i did after: mkdir -p urls , cd urls , touch seed.txt , nano seed.txt

  16. i only wrote this url in the file as it's suggested in official tutorial of nutch: http://nutch.apache.org

17After i saved my changed in seed.txt file. I edit the conf/regex-urlfilter.txt file. I delete these two lines:

accept anything else

+.

Then i wrote this instead of them:

+^http://([a-z0-9]*\.)*nutch.apache.org/

After that,

I used this command as it's suggested in tutorial: bin/nutch crawl urls -dir crawl -depth 3 -topN 5

After this command i see this error message: Error: JAVA_HOME is not set.

I also found this article but it didn't solve my problem either: Nutch - Getting Error: JAVA_HOME is not set. when trying to crawl

Upvotes: 1

Views: 2669

Answers (1)

betolink
betolink

Reputation: 113

First try: readlink -f $(which java)

That will tell you exactly where your JAVA_HOME is, you should see something like:

  /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java

Then try using this value to set your JAVA_HOME just before you call the crawl script i.e.

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre/ 
bin/nutch crawl urls -dir crawl -depth 3 -topN 5

note that the value should point to the JRE directory inside a valid JDK location.

p.s. You are missing the Solr URL parameter (in case you want to index the crawled documents of course)

Upvotes: 1

Related Questions