mP.
mP.

Reputation: 18266

Security (aka Permissions) and Lucene - How ? Should it be done?

First some background to my question.

The probelm relates to introducing Lucene and performing a search which simply returns a list of matching entity instances. My code would then need to filter entities one by one. This approach is extremely inefficient as the situation exists that a user may only be able to see a small minority and checking many to return a few is less than ideal.

What approaches or how would developers solve this problem - keeping in mind that indexing and searches are performed using Lucene ?

EDIT

Definitions

Indexing

Security Check

Upvotes: 19

Views: 5652

Answers (4)

pokeRex110
pokeRex110

Reputation: 855

What I would suggest is having two kind of documents:

1) Real_documents with a field called: "DocumentID"

2) A security document with fields: "Role" "Groups" "Users" "PermisionId" "DocumentsIds"

then a pseudo-code could be:

   Field[] docIds =searcher.search("Users", "currentUser").getFields("DocumentIds");
   TermsFilter filter = new TermFilter();

   foreach(field:docIDs){
       filter.add(new Term(field.field(),field.text());
   }
   searcher.search(query.getWeight(searcher), filter, numberOfDocuments);

Being that Lucene is very fast on searching two searches are really easy to make. In this way you also have a better tf-idf per user.

Upvotes: 0

Filipe Correia
Filipe Correia

Reputation: 5677

As Yuval mentioned, it might be worth having the permission mechanism independent of the lucene index.

One way to do it is to implement your own Collector, that will filter out the results that the user should not have access to.

Upvotes: 0

Yuval F
Yuval F

Reputation: 20621

It depends on your security model. If permissions are simple - say you have three classes of documents - It is probably best to build a separate Lucene index per class, and merge the results when a user can see more than one class. The Solr security Wiki suggests something similar to HakonB's suggestion - adding user's credentials to the query and searching by them. See also this discussion in the Lucene user group. Another strategy will be to wrap the Lucene search with a separate security class that does additional filtering out of Lucene. It may be faster if you can do this using a database for the permissions.

Edit: I see you have a rather complex permission system. Your basic design choice is whether to implement it inside Lucene or outside Lucene. My advice is to use Lucene as a search engine (its primary strength) and use another system/application for security. If you choose to use Lucene for security anyway, I suggest you learn Lucene Filters well, and use a bitset filter in order to filter a query's results. It does have the problems you listed of having to keep the permissions updated.

Upvotes: 3

HakonB
HakonB

Reputation: 7075

It depends on the number of different security groups that are relevant in your context and how the security applies to your indexed data.

We had a similar issue which we solved the following way: When indexing we added the allowed groups to the document and when searching we added a boolean query with the groups the user was a member of. That performed well in our scenario.

Upvotes: 7

Related Questions