Reputation: 2725
There is a use-case for us, where we spin-up an embedded solr-server (using the SolrJ EmbeddedSolrServer api) from a remote solr instance. This is so that we can serve documents extremely fast in a query pipeline.
One of the things I am stuck at is the determination of if the remote solr instance has been modified in any ways since the last sync was done. Obviously, a naive way to do is compare docs. one each at a time. However, that would be extremely inefficient and completely negate the entire purpose of being fast.
Thanks for any tips or recommendations.
Upvotes: 2
Views: 220
Reputation: 52802
Each version of the Lucene index is assigned a version number. This version number is exposed through the Replication Handler (which you might already be using to replicate the index to your local embedded Solr instance):
http://host:port/solr/core_name/replication?command=indexversion
Returns the version of the latest replicatable index on the specified master or slave.
If you want to do it more manually, you can use the _version_
field that is automagically added to all documents in recent version of Solr, and use that to fetch any _version_
values that is larger than the current, largest version in your index. This assumes you use the default _version_
numbering (which you kind of have to, since it's also used internally for Solr Cloud).
Upvotes: 3
Reputation: 8658
If you want to track the individual documents, then you can have a date field which will be applied for every document on the solr side.
I mean you can add a new date field to the schema file which will have named as UpdateDateTime
and this field is updated for every time the document entity is modified or newly added document.
I am not very sure how are you maintaining the deleting of documents on the solr side. If you are not maintaining the deletion then you can have another boolen field which will be isDeleted
.
Upvotes: 1