striker
striker

Reputation: 1263

Solr index update by query

I need to update large amount of documents in solr very often. For example, set "online" = true for user_id = 5 and so on. But speed of indexation via http handler is very slow. Solr support delete documents by query, is there way to update by query?

Upvotes: 8

Views: 18498

Answers (5)

uncoolbob
uncoolbob

Reputation: 109

There's still no update by query, but the answers from 2012 are out of date. Now in Solr 4.x there are https://wiki.apache.org/solr/Atomic_Updates so you can do what you want to do in two steps without requiring access to the original document.

Upvotes: 7

Avner Levy
Avner Levy

Reputation: 6741

You can develop a minimal Solr plugin which will do the work for you on the solr server side.
Have a look at: Discussion on Solr mailing list

Upvotes: 1

javanna
javanna

Reputation: 60195

No, unfortunately there isn't any feature like update by query. It would be really useful, like a new feature to make possible updating a document without the need to resubmit it entirely; there's a 5 years old jira issue for that. For now you should just re-submit your documents with the updated fields, they will be overwritten (it means deleted + re-inserted) if you use the same uniqueKey.

By the way, are you making an http request for each document to update? If yes, you can make it faster submitting more than one document at a time like this:

<add>
  <doc>
    <field name="employeeId">05991</field>
    <field name="office">Bridgewater</field>
  </doc>
  <doc>
    <field name="employeeId">05992</field>
    <field name="office">Bridgewater</field>
  </doc>
  <doc>
    <field name="employeeId">05993</field>
    <field name="office">Bridgewater</field>
  </doc>
</add>

Upvotes: 11

Fuxi
Fuxi

Reputation: 5488

I would use DIH with modified SQL query that will accept parameters from URL. SQL query will look like:

SELECT user_name, user_online FROM users WHERE user_id=${dataimporter.request.user_id}

Then to reindex selected users you are adding user_id parameter to URL like that:

http://<host>:<port>/solr/dataimport?command=full-import&clean=false&user_id=5

Docs about using DIH and custom parameters: Solr - DataImportHandler

Upvotes: 0

Paige Cook
Paige Cook

Reputation: 22555

As javanna answered, there is not any facility to update by query, as Solr also does not allow you update individual fields in a document stored in the index, so a re-submit is the only method of updating. I am curious though as to why your updates are so slow. Below are a few ways that you could improve the update speed.

  • If you issuing a commit after updating each individual document, then wait and only issue the update after you have updated a batch of documents in the index. From the Solr Tutorial:

    Commit can be an expensive operation so it's best to make many changes to an index in a batch and then send the commit command at the end. There is also an optimize command that does the same thing as commit, in addition to merging all index segments into a single segment, making it faster to search and causing any deleted documents to be removed.

  • Look at using soft commits or auto soft commits to reduce the update latency. Please refer to the NearRealtimeSearch page on the Solr Wiki for more details.

Upvotes: 1

Related Questions