DNA
DNA

Reputation: 42607

Solr document disappears when I update it

I am trying to update existing documents in a (Sentry-secured) Solr collection. The updates are accepted by Solr, but when I query, the document seems to have disappeared from the collection.

What is going on?

I am using Cloudera (CDH) 5.8.3, and Sentry with document-level access control enabled.

Upvotes: 0

Views: 223

Answers (1)

DNA
DNA

Reputation: 42607

When using document-level access control, Sentry uses a field (whose name is defined in solrconfig.secure.xml, but the default is sentry_auth) to determine which roles can see that document.

If you update a document, but forget to supply a sentry_auth field, then the updated document doesn't belong to any roles, so nobody can see it - it becomes essentially invisible! This is easily done, because the sentry_auth field is typically not a stored field, so won't be returned by any queries.

You therefore cannot just retrieve a document, modify a field, then update the document - you need to know which roles that document belongs to, so you can supply a properly-populated sentry-auth field.

You can make the sentry_auth field a "required" field, in the Solr schema, which will prevent you from accidentally omitting it.

However, this won't prevent you from supplying a blank sentry-auth field (or supplying incorrect roles), either of which will also make the document "disappear".

Also note that you can update a document that you do not have document-level access to, provided you have write-access to the collection as a whole, and you have the ID of the document. This means that users can (deliberately or accidentally) over-write or delete documents that they cannot see. This is a design choice, made so that users cannot find out whether a particular document ID exists, when they do not have document-level access to it.

See the Cloudera documentation:

Upvotes: 0

Related Questions