İsmet Alkan
İsmet Alkan

Reputation: 5447

Nutch Seed URLs

Is it possible to get URLs into Nutch directly from a database or a service etc. I'm not interested in the ways which data is taken from the database or service and written to seed.txt.

Upvotes: 0

Views: 627

Answers (1)

Tejas Patil
Tejas Patil

Reputation: 6169

No. This cannot be done directly with the default nutch codebase. You need to modify Injector.java to achieve that.

EDIT:

Try using DBInputFormat : an InputFormat that reads input data from an SQL table. You need to modify the Inject code here (line 3 in snippet below):

JobConf sortJob = new NutchJob(getConf());
sortJob.setJobName("inject " + urlDir);
FileInputFormat.addInputPath(sortJob, urlDir);
sortJob.setMapperClass(InjectMapper.class);

Upvotes: 1

Related Questions