Reputation: 11
I am using solr 7.3.1 for indexing documents. Right now it's indexing every documents in the referred location(which is quite large close to 1 TB). It takes 3-4 days to index the entire folder. And the documents keep getting edited, added, deleted every hour. What is the best approach to keep the solr index updated ?
Upvotes: 1
Views: 230
Reputation: 52792
Create a small application that listens to file system events inside the document hierarchy where the documents are stored.
That way you can send the documents to Solr as soon as they're written to disk. Exactly how you do that will depend on your operating system and what languages you can write code in. There's hooks for inotify
under Linux that you can use through inotifywait
and bash
, or you can use inotify
as a python module.
That way you can index any updated document as soon it has been written to the disk, and you can do this while the regular, initial indexing operation runs.
However, if every document changes each hour (meaning you have to index every single document within the hour, each hour), you'll have to scale your infrastructure to be able to index the content as fast as possible within the hour, but exactly how to do that will depend on many factors (such as document types, available libraries, other limitations in the project, etc.), and is probably outside of what can be decently answered here.
Upvotes: 1