Reputation: 1887
I have installed drupal 7 and the apache solr search module and configured with Apache Solr(solr version:4.10.4). The content has been indexed from the drupal to the apache solr and searching also works fine.I need to configure Nutch(Apache Nutch Version:1.12) web crawler to the apache solr and drupal 7 and to fetch the details from the specific URL (for eg: http://www.w3schools.com) and need to search in the drupal for the contents. My problem is how to configure all three solr nutch and drupal 7.Can any one suggest the solution for this?
Upvotes: 0
Views: 321
Reputation: 3253
My 2 cents on this: looks like you want to aggregate content from your Drupal site (your nodes) and from an external content hosted on your site but not as a Drupal content right? If this is the case then you don't need to any integration between Nutch and Drupal, just to index everything in the same Solr core/collection. Of course you'll need to make sure that the Solr schema is compatible (Nutch has it's own metadata different from the Drupal nodes). Also if you index in separated cores/collections you could use the shards
parameter to span you query to several cores and still get only one result set, but with this approach you'll need to keep and eye on the relevance of your results (the order of the documents) and also keep and eye on what fields the Drupal Solr module uses to show the result, so in the end you'll still need to make the schema of both cores compatible at some degree.
Upvotes: 0
Reputation: 3670
Ok... here's my ugly solution that maybe fits in what you are doing.
You can use a php field (a custom field with Display Suite) in your node (or page) which basically reads your full page with CURL and then print the contents right there. This field should be only in a display of your node that will see nobody (except Apache Solr).
Finally in Solr config (which honestly I don't remember well how it worked) you could choose which display of the page to be indexed, or the field to be indexed, which will be your full page.
If all these works, you don't need to integrate Nutch with Solr and Drupal.
Good luck :)
PD: If you have a doubt just ask.
Upvotes: 0