Reputation: 29
After I run this command in nutch:
bin/nutch crawl urls -dir crawl -depth 3 -topN 5
I get a list of urls, just say 50 urls , but anyone know to separate all the url by the depth.
So I will get the result:
URL from depth 1 = 5 urls
url
url
url
......
URL from depth 2 = 15 urls
url
url
url
......
Something like that, is there anyone already solved this problem?
Is there an function in nutch to solved this problem?
Any help will be appreciate.
Upvotes: 0
Views: 151
Reputation: 6169
There is no in-built function in nutch to do this. But simple hack will be to run the nutch command with dept 1, copy the web table and then run again for deth 1. So you will have 2 versions of the nutch web-table corresponding to each round
Upvotes: 1