Reputation: 14204
I am working on a project that uses Elasticsearch. I have my core search UI working. I'm now looking to improve some things. In this process, I discovered that I do not really understand what happens during "indexing". I understand what an index is. I understand what a document is. I understand that indexing happens either a) when a document is added b) when a document is updated) or c) when the refresh endpoint is called.
Still, I do not really understand the detail behind indexing. For example, does indexing happen if a document is removed? What really happens during indexing? I keep looking for some documentation that explains this. However, I'm not having any luck.
Can someone please explain what happens during indexing and possibly point out some documentation?
Thank you!
Upvotes: 4
Views: 2890
Reputation: 16335
Indexing is a huge process and has a lot of steps involved in it. I will try to provide a brief intro to the major steps in indexing process
Every word in a text field needs to be searchable,
The data structure that best supports the multiple-values-per-field requirement is the inverted index. The inverted index contains a sorted list of all of the unique values, or terms, that occur in any document and, for each term, a list of all the documents that contain it.
First of all, please do note that a "lucene index is immutable"
Hence, in case of any (CRUD (-R)) operation, instead of rewriting the whole inverted index, lucene adds new supplementary indices to reflect more-recent changes.
Indexing Process
Every so often, the buffer is commited:
The in-memory buffer is cleared, and is ready to accept new documents.
What happens in case of Delete
Segments are immutable, so documents cannot be removed from older segments.
When a document is “deleted,” it is actually just marked as deleted in the .del file. A document that has been marked as deleted can still match a query, but it is removed from the results list before the final query results are returned.
When is it actually removed
In Segment Merging, deleted documents are purged from the filesystem.
References :
Upvotes: 4