Reputation: 11
I am using apache nutch to perform crawling on rosettacode. I dont want to crawl through entire website, i just want to crawl on selected topics(eg.http://www.rosettacode.org/mw/index.php?title=Special%3ASearch&search=Optimization+algorithms&go=Go). But i am unable to perform crawl and it is throwing me error saying "no urls to fetch.. check ur seed list and url filters". Can anyone help me to solve this problem??
Upvotes: 0
Views: 62
Reputation: 1170
The url you giving is actually rejecting at injecting phase.
You have to specify the regex that accepts the url in regex-urlfilter.txt or leave it as +.
which means it accept all urls.
-[?*!@=]
The above pattern rejects your url. Since, it contains ? and =
Upvotes: 1