Reputation: 2337
I know that lucene creates an index and stores all the data .Can any one tell me how the data is stored in flat file? or what kind of algorithms they use to store the data in backend so that they can retrieve it quickly?
Upvotes: 16
Views: 17069
Reputation: 4774
Don't know if this is what you asked for. But the more general answer is that they use/implement a Inverted Index. The specifics of how Lucene stores it you can find in file formats (as milan said).
But the general idea is that they store a Inverted Index data structure and other auxiliar data structures to help answer queries quickly. For example, it stores a vector of norms for each document and each term's IDF (inverse document frequency). Lucene also stores the actual document fields, but that is outside the Inverted Index.
Upvotes: 8
Reputation: 2113
You can read this book http://nlp.stanford.edu/IR-book/ to know about the data structures, algorithms and models used in information retrieval systems
Upvotes: 4