Sravan
Sravan

Reputation: 553

What are the space Limits of Lucene Index?

I am adding Billions of rows to Lucene index, each row is almost 6000 Bytes. Is there any limit on the maximum number of rows that can be added to Lucene Index? How much space would Billion rows of 6000 bytes occupy on Lucene Index. Is there any limit for this size?

Upvotes: 6

Views: 9217

Answers (1)

jpountz
jpountz

Reputation: 9964

See Lucene documentation for its limitations, it cannot have more than

  • ~ 274 billion distinct terms,
  • ~ 2.1 billion documents.

For such large datasets, it is generally a good idea to only use Lucene for its inverted index, and to store the actual content of documents somewhere else. You can expect the index size to be ~ 30% of the size of the original corpus of documents (provided these are regular documents, computationally-generated documents with a lot of unique terms would generate a much bigger index).

Upvotes: 8

Related Questions