Ricxzone05
Ricxzone05

Reputation: 29

Nutch : How to separate url result from the depth : 1 and the result from depth : 2

After I run this command in nutch:

bin/nutch crawl urls -dir crawl -depth 3 -topN 5

I get a list of urls, just say 50 urls , but anyone know to separate all the url by the depth.

So I will get the result:

URL from depth 1 = 5 urls

......

URL from depth 2 = 15 urls

......

Something like that, is there anyone already solved this problem?

Is there an function in nutch to solved this problem?

Any help will be appreciate.

Upvotes: 0

Views: 151

Answers (1)

Tejas Patil
Tejas Patil

Reputation: 6169

There is no in-built function in nutch to do this. But simple hack will be to run the nutch command with dept 1, copy the web table and then run again for deth 1. So you will have 2 versions of the nutch web-table corresponding to each round

Upvotes: 1

Related Questions