Reputation: 1428
I'm experimenting Apache Nutch 1.7 and Solr on Ubuntu 14.04 x64 (AMD) LTS and when i try to run Nutch, it gives me this error message:
Error: JAVA_HOME is not set.
But when i type echo $JAVA_HOME command on terminal, it gives me this path: /usr/lib/jvm/java-7-openjdk-amd64
Below you can see what i've done step by step. How can i fix this?
*ps: Ubuntu is a virtual machine which runs on Mac with Oracle VirtualBox
Setting JAVA_HOME with:
sudo nano /etc/environment
Then typing following line at the bottom of file: JAVA_HOME="/usr/lib/jvm/java-7-openjdk-amd64"
kntrl+X shortcut for Saving changes.
Then this command: source /etc/environment
Now JAVA_HOME must be set. I checked it by following command and it gives me the path. echo $JAVA_HOME and output is same as above.
Then i installed Solr by sudo apt-get -y install solr-tomcat
I controlled installation by typing this address in a browser: http://localhost:8080/solr
and it shows me initial page of solr
I downloaded Apache Nutch 1.7 from http://nutch.apache.org and file was named as apache-nutch-1.7.-bin.tar.gz
Then extract it: tar -zxvf apache-nutch-1.7-bin.tar.gz
I verfied Nutch's installation by simply this: cd apache-nutch-1.7 then bin/nutch And the output is like Usage: nutch COMMAND where......
Then i edit my conf/nutch-site.xml file as in here: Link (You need to look under this title: "3) Set Up Your Nutch-Site.Xml" ) Things i did different from that last reference are; MyBot and MyBot,* fields. Instead of MyBot i wrote mySpider
Then i get in conf directory of nutch with Terminal. Here's what i did after: mkdir -p urls , cd urls , touch seed.txt , nano seed.txt
i only wrote this url in the file as it's suggested in official tutorial of nutch: http://nutch.apache.org
17After i saved my changed in seed.txt file. I edit the conf/regex-urlfilter.txt file. I delete these two lines:
accept anything else
+.
Then i wrote this instead of them:
+^http://([a-z0-9]*\.)*nutch.apache.org/
After that,
I used this command as it's suggested in tutorial: bin/nutch crawl urls -dir crawl -depth 3 -topN 5
After this command i see this error message: Error: JAVA_HOME is not set.
I also found this article but it didn't solve my problem either: Nutch - Getting Error: JAVA_HOME is not set. when trying to crawl
Upvotes: 1
Views: 2669
Reputation: 113
First try: readlink -f $(which java)
That will tell you exactly where your JAVA_HOME is, you should see something like:
/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java
Then try using this value to set your JAVA_HOME just before you call the crawl script i.e.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre/
bin/nutch crawl urls -dir crawl -depth 3 -topN 5
note that the value should point to the JRE directory inside a valid JDK location.
p.s. You are missing the Solr URL parameter (in case you want to index the crawled documents of course)
Upvotes: 1