Reputation: 2449
I am building a system where entries are added to a SQL database sporadically throughout the day. I am trying to create a system which imports these entries to SOLR each time.
I cant seem to find any infomation about adding individual records to SOLR from SQL. Can anyone point me in the right direction or give me a bit more information to get me going?
Any help would be much appreciated,
James
Upvotes: 1
Views: 620
Reputation: 5708
Besides DIH, you could setup a trigger in your db to fire Solr's REST service that would update changed docs for all inserted/updated/deleted documents.
Also, you could setup a Filter (javax.servlet spec) in your application to intercept server requests and push them to Solr before they even reach database (it can even be done in the same transaction, but there's rarely a real need for that, eventual consistency is usually fine for search engines).
Upvotes: 0
Reputation: 81
As mentioned above, the Data Import Handler can fill your need, however an important limitation is that it does not queue requests. The result of this is that if the DIH is busy indexing some content and you fire off another DIH request, the second one will be ignored and not indexed.
As Ansari suggested, the more direct route is to simply HTTP POST the data directly to the Solr server. He pointed out the XML method which absolutely works just fine. However, I found it simpler to use the http://wiki.apache.org/solr/UpdateJSON/ method, as it allowed me to use native data structures when gathering the data.
When using the UpdateJSON or UpdateXMLMessage method, I would strongly suggesting using the "commitWithin" parameter instead of the "commit". Committing can be (relatively) lengthy process which requires Solr to lock files, and commitWithin will batch multiple update requests into a single commit vs "commit" requires a file lock for each POST.
Upvotes: 1
Reputation: 8218
If you have access to the code that's adding the entries to your SQL database, just modify it to additionally create an XML string and POST it to your Solr server URL. This way you avoid a lot of complexity. For example, in PHP you might do something like this:
$url = "http://localhost:7641/solr/update";
$header = array("Content-type:text/xml; charset=utf-8");
$postString = "<add><doc><field name=\"id\">24</field></doc?</add>";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLINFO_HEADER_OUT, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postString);
$data = curl_exec($ch);
curl_setopt($ch, CURLOPT_POSTFIELDS, "<commit />");
$data = curl_exec($ch);
Use a curl library for Python - it'll be simpler than the above code.
If you don't have access to that code, add a last_modified timestamp field to your database and use the delta import functionality of the DataImportHandler to query it for new items and import them into Solr. You will have to call the DataImportHandler request handler periodically.
Upvotes: 1
Reputation: 8362
Have you seen the wiki page for the DataImportHandler? I believe that it does what you want.
Upvotes: 1