Reputation: 1469
I've been consulting one tutorial after another and have spent oodles of time searching.
I installed SOLR from scratch and start it up.
bin/solr start
I successfully navigate to the SOLR admin. Then I create a new core.
bin/solr create -c core_wiki -d basic_configs
I look at the help for the bin/post
command.
bin/post -h
...
* Web crawl: bin/post -c gettingstarted http://lucene.apache.org/solr -recursive 1 -delay 1
...
So I try to make a similar call... but I keep getting a FileNotFound error.
bin/post -c core_wiki http://localhost:8983/solr/ -recursive 1 -delay 10
/usr/lib/jvm/java-7-openjdk-amd64/jre//bin/java -classpath /home/ubuntu/src/solr-5.4.0/dist/solr-core-5.4.0.jar -Dauto=yes -Drecursive=1 -Ddelay=10 -Dc=core_wiki -Ddata=web org.apache.solr.util.SimplePostTool http://localhost:8983/solr/
SimplePostTool version 5.0.0
Posting web pages to Solr url http://localhost:8983/solr/core_wiki/update/extract
Entering auto mode. Indexing pages with content-types corresponding to file endings xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, depth=1, delay=10s
Entering crawl at level 0 (1 links total, 1 new)
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/core_wiki/update/extract?literal.id=http%3A%2F%2Flocalhost%3A8983%2Fsolr&literal.url=http%3A%2F%2Flocalhost%3A8983%2Fsolr
SimplePostTool: WARNING: Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/core_wiki/update/extract. Reason:
<pre> Not Found</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/>
</body>
</html>
SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8983/solr/core_wiki/update/extract?literal.id=http%3A%2F%2Flocalhost%3A8983%2Fsolr&literal.url=http%3A%2F%2Flocalhost%3A8983%2Fsolr
SimplePostTool: WARNING: An error occurred while posting http://localhost:8983/solr
0 web pages indexed.
COMMITting Solr index changes to http://localhost:8983/solr/core_wiki/update/extract...
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/core_wiki/update/extract?commit=true
SimplePostTool: WARNING: Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/core_wiki/update/extract. Reason:
<pre> Not Found</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/>
</body>
</html>
Time spent: 0:00:00.041
I'm still fairly new to SOLR indexing. Any hints that could point me in the right direction would be appreciated.
Upvotes: 1
Views: 1653
Reputation: 16085
It seems that the request handler named /update/extract
is missing from your configuration.
The ExtractingRequestHandler is not incorporated into the solr war file, it is provided as a SolrPlugins, and you have to load it (and it's dependencies) explicitly. (Apache Solr Wiki)
It should be defined in solrconfig.xml, like :
<requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
Upvotes: 1