Reputation: 725
I am trying to get nutch 1.3 and solr 3.1 working together.
Note: I am using Windows and have Cygwin installed.
I have nutch installed and did a basic crawl (running from runtime/local)
bin/nutch crawl urls -dir crawl -depth 3
This seems to have worked based on teh logs (crawl.log) ... LinkDb: finished at 2011-10-24 14:22:47, elapsed: 00:00:02 crawl finished: crawl
I have solr installed and verified install with localhost:8983/solr/admin
I copied the nutch schema.xml file to the example\solr\conf folder
When I run the following command
bin/nutch solrindex http://localhost:8983/solr crawl/crawldb crawl/linkdb crawl/segments/*
I get the following error (hadoop.log)
2011-10-24 15:39:26,467 WARN mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: ERROR:unknown field 'content'
ERROR:unknown field 'content'
request: http://localhost:8983/solr/update?wt=javabin&version=2
...
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2011-10-24 15:39:26,676 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
What am I missing?
Upvotes: 2
Views: 1950
Reputation: 52779
Seems the content field definition is missing in the schema.xml.
e.g.
<field name="content" type="text" stored="false" indexed="true"/>
The example schema.xml @ http://svn.apache.org/viewvc/nutch/branches/branch-1.3/conf/schema.xml?view=markup seems to have it. You may want to check the schema.xml you copied over.
Upvotes: 0