Reputation: 606
I want to crawl a specific site which uses cookies for authentication. I want to set cookie and user-agent information in every GET request that Apache Nutch makes for crawling the site.
How do I specify the cookie information in the config or is there the need for writing a custom plugin for this purpose?
Upvotes: 1
Views: 531
Reputation: 3253
At the moment there is not way of manually specifying a cookie/header for Nutch to send when fetching the URLs. The plugin protocol-httpclient
have some support for form based authentications, take a look at the httpclient-auth.xml
file. I don't think this would be too hard to implement, and we always welcome contributions.
Upvotes: 1