Reputation: 11
I am trying to configure nutch in windows 7 and i have followed the follwing steps
I have download and unziped the apache nutch 1.8, I have specified the agent name in conf/nutch-site.xml like
<configuration>
<property>
<name>http.agent.name</name>
<value>My Nutch Spider</value>
</property>
</configuration>
and in apache home follwing command i did ->
mkdir -p urls
cd urls
touch seed.txt --> to create a text file seed.txt under urls/ with the following content (one URL per line for each site you want Nutch to crawl).
nutch.apache.org/
in conf/regex-urlfilter.txt edit with--> +^([a-z0-9]*.)*nutch.apache.org/
but in bin when i am doing
bin/nutch crawl urls -dir crawl -depth 3 -topN 5
error occurred -> bash: nutch: command not found
why ?
Upvotes: 1
Views: 859
Reputation: 807
Nutch scripts are written for linux environments.
You might use this (although it seems it needs a lot more work to be completed):
https://github.com/veggen/nutch-windows-script
Or setup Cygwin as suggested here:
http://wiki.apache.org/nutch/GettingNutchRunningWithWindows
Upvotes: 0