David Powell
David Powell

Reputation: 547

Why doesn't this Cloudant/couchdb $regex query work?

I am trying to pull (and delete) all records from our database that don't have a URL with the word 'box' in it. This is the query I'm using:

{
   "selector": {
      "$not": {
         "url": {
            "$regex": ".*box.*"
         }
      }
   },
   "limit": 50
}

This query returns no records. But if I remove the $not, I get all records that do have the word 'box' in the url, but that's the opposite of what I want. Why do I get no results when adding the $not?

I have tried adding a simple base to the query like "_id":{"$gte":0} but that doesn't help.

Upvotes: 1

Views: 1520

Answers (2)

Will Holley
Will Holley

Reputation: 1765

The "no results" issue is due to a bug in text indexes that has been recently fixed. However, neither $not nor $regex operators are able to take advantage of global indexes so will always result in a full database or index scan.

The way to optimise this query is to use a partial index. A partial index filters documents at indexing time rather than at query time, creating an index over a subset of the database. You then need to tell the _find endpoint to explicitly use the partial index. For example, create an index which only includes documents not matching your regex:

POST /<db>/_index
{
  "index": {
    "partial_filter_selector": {
      "url": {
         "$not": {
            "$regex": ".*box.*"
         }
      }
    },
    "fields": ["type"]
  },
  "ddoc" : "url-not-box",
  "type" : "json"
}

then at query time:

{
   "selector": {
      "url": {
         "$not": {
            "$regex": ".*box.*"
         }
      }
   },
   "limit": 50,
   "use_index": "url-not-box"
}

You can see how many documents are scanned to fulfil the query in the Cloudant UI - the execution statistics are displayed in a popup underneath the query text area.

You may also find this This article about partial indexes helpful.

Upvotes: 1

vabarbosa
vabarbosa

Reputation: 706

from the Cloudant doc:

You can create more complex selector expressions by combining operators. However, for Cloudant NoSQL DB Query indexes of type json, you cannot use 'combination' or 'array logical' operators such as $regex as the basis of a query.

$not is a combination operator and therefore cannot be the basis of a query

i am able to get the following to work:

index

{
    "index": {
        "fields": ["url"]
    },
    "name" : "url-json-index",
    "type" : "json"
}

query

{
   "selector": {
      "url": {
         "$not": {
            "$regex": ".*box.*"
         }
      }
   },
   "limit": 50,
   "use_index": "url-json-index"
}

if you are still seeing problems, can you provide the output from _/explain and the indexes you have in place.

Upvotes: 1

Related Questions