Reputation: 18266
First some background to my question.
The probelm relates to introducing Lucene and performing a search which simply returns a list of matching entity instances. My code would then need to filter entities one by one. This approach is extremely inefficient as the situation exists that a user may only be able to see a small minority and checking many to return a few is less than ideal.
What approaches or how would developers solve this problem - keeping in mind that indexing and searches are performed using Lucene ?
EDIT
Definitions
Indexing
Security Check
Upvotes: 19
Views: 5652
Reputation: 855
What I would suggest is having two kind of documents:
1) Real_documents with a field called: "DocumentID"
2) A security document with fields: "Role" "Groups" "Users" "PermisionId" "DocumentsIds"
then a pseudo-code could be:
Field[] docIds =searcher.search("Users", "currentUser").getFields("DocumentIds");
TermsFilter filter = new TermFilter();
foreach(field:docIDs){
filter.add(new Term(field.field(),field.text());
}
searcher.search(query.getWeight(searcher), filter, numberOfDocuments);
Being that Lucene is very fast on searching two searches are really easy to make. In this way you also have a better tf-idf per user.
Upvotes: 0
Reputation: 5677
As Yuval mentioned, it might be worth having the permission mechanism independent of the lucene index.
One way to do it is to implement your own Collector
, that will filter out the results that the user should not have access to.
Upvotes: 0
Reputation: 20621
It depends on your security model. If permissions are simple - say you have three classes of documents - It is probably best to build a separate Lucene index per class, and merge the results when a user can see more than one class. The Solr security Wiki suggests something similar to HakonB's suggestion - adding user's credentials to the query and searching by them. See also this discussion in the Lucene user group. Another strategy will be to wrap the Lucene search with a separate security class that does additional filtering out of Lucene. It may be faster if you can do this using a database for the permissions.
Edit: I see you have a rather complex permission system. Your basic design choice is whether to implement it inside Lucene or outside Lucene. My advice is to use Lucene as a search engine (its primary strength) and use another system/application for security. If you choose to use Lucene for security anyway, I suggest you learn Lucene Filters well, and use a bitset filter in order to filter a query's results. It does have the problems you listed of having to keep the permissions updated.
Upvotes: 3
Reputation: 7075
It depends on the number of different security groups that are relevant in your context and how the security applies to your indexed data.
We had a similar issue which we solved the following way: When indexing we added the allowed groups to the document and when searching we added a boolean query with the groups the user was a member of. That performed well in our scenario.
Upvotes: 7