Falci
Falci

Reputation: 1873

What is the best practice to index data from multiple users in Lucene

I have a multiuser system. Each user creates indexable content, but each user can only search your own content.

What better way?

  1. Create a single directory index, index everything in there, and then filter when searching.
  2. Create a directory index for each client and show all results

Upvotes: 1

Views: 107

Answers (1)

mindas
mindas

Reputation: 26723

If there is no need to share the data among users' content, I would go for the second option. Filtering adds overhead and searches might take longer as the corpus will be larger. Not to mention scalability issues, unnecessary GC overhead, etc.

The downside is that you will likely not be able to benefit from field cache as you will have to open/close the index for each user every time. But if you can identify which users are still active and keep their readers open, this can be alleviated.

Sotirios Delimanolis raised a point that 10M directories might be a pain to manage. This is valid point - many files/directories in a single directory does not scale in most of the file systems. But you can always distribute these directories so they form a nice balanced tree.

Upvotes: 2

Related Questions