NiGHTS
NiGHTS

Reputation: 23

How to use CouchDB Mango query (/db/_find) with an index to select multiple _id keys

I am using CouchDB 3.1.1 to perform Mango queries against a database containing a large number of documents. A very common requirement in my application is to perform queries on a very specific and dynamic set of documents. From what I understand at this moment, these are the only choices I have on how to confront my problem:

  1. Make multiple requests to /db/_find each with a distinct "_id"
  2. Make a single call to /db/_find

Of the ways I can accomplish the second choice:

The second choice is what I would prefer to use since making multiple POST requests would incur overhead. Unfortunately using "$or" seems to get in the way of the query engine making use of the "_id" index.

Thus, choice #1 returns with a speedy 2 ms per transaction but the results are not sorted (requiring my application to do the sorting). Choice #2, given an array of 2 _ids, regardless of the $or syntax, takes over 3 seconds to render.

What is the most efficient way to use a CouchDB Mango query index against a specific set of documents?

Fast Example: Results using a single _id

{
   "selector": {
      "_id": "184094"
   },
   "fields": [
      "_id"
   ]
}

documents examined: 26,312
results returned: 1
execution time: 2 ms

Slow Example: Results using $or of key / value pairs

{
   "selector": {
      "$or": [
         {
            "_id": "184094"
         },
         {
            "_id": "157533"
         }
      ]
   },
   "fields": [
      "_id"
   ]
}

documents examined: 26,312
results returned: 2
execution time: 2,454 ms

Slow Example: Results using $or array of values

{
   "selector": {
      "_id": {
         "$or": [
            "184094",
            "157533"
         ]
      }
   },
   "fields": [
      "_id"
   ]
}

documents examined: 26,312
results returned: 2
execution time: 2,522 ms

Slow Example: Results using $in (which is illegal but still returns results)

{
   "selector": {
      "_id": {
         "$in": [
            "184094",
            "157533"
         ]
      }
   },
   "fields": [
      "_id"
   ]
}

documents examined: 26,312
results returned: 2
execution time: 2,618 ms

Index: The registered index for _id

{
  "_id": "_design/508b5b51e6085c2f96444b82aced1e5dfec986b2",
  "_rev": "1-f951eb482f9a521752adfdb6718a6a59",
  "language": "query",
  "views": {
    "foo-index": {
      "map": {
        "fields": {
          "_id": "asc"
        },
        "partial_filter_selector": {}
      },
      "reduce": "_count",
      "options": {
        "def": {
          "fields": [
            "_id"
          ]
}}}}}

Explain: An 'explain' summary done to one of the slow queries. Note that the registered index was used.

{
 "dbname": "dnp_person_comment",
 "index": {
  "ddoc": "_design/508b5b51e6085c2f96444b82aced1e5dfec986b2",
  "name": "foo-index",
  "type": "json",
  "partitioned": false,
  "def": {
   "fields": [
    {
     "_id": "asc"
    }
   ]
  }
 },
 "partitioned": false,
 "selector": {
  "$or": [
   {
    "_id": {
     "$eq": "184094"
    }
   },
   {
    "_id": {
     "$eq": "157533"
    }
   }
  ]
 },
 "opts": {
  "use_index": [],
  "bookmark": "nil",
  "limit": 25,
  "skip": 0,
  "sort": {},
  "fields": [
   "_id"
  ],
  "partition": "",
  "r": [
   49
  ],
  "conflicts": false,
  "stale": false,
  "update": true,
  "stable": false,
  "execution_stats": false
 },
 "limit": 25,
 "skip": 0,
 "fields": [
  "_id"
 ],
 "mrargs": {
  "include_docs": true,
  "view_type": "map",
  "reduce": false,
  "partition": null,
  "start_key": [],
  "end_key": [
   "<MAX>"
  ],
  "direction": "fwd",
  "stable": false,
  "update": true,
  "conflicts": "undefined"
 }
}

Upvotes: 2

Views: 3619

Answers (1)

This should help to understand why and when $or may be slow. https://github.com/apache/couchdb/issues/1852

The find lookup looks at the $or fields and if they match -- uses the corresponding field index if such exists, otherwise it uses the default _all_docs index which means it iterates through all the documents of the database to find the docs you're looking for which is quite inefficient thing to do. Since you query IDs the default index is what you want, and it will go through all the docs looking for a match.

I still tho wonder if there is any good way to fetch docs by selector with IDs specified in it. I have been looking at the views, but you can't use selector on them

In your particular case, you can use bulk get https://docs.couchdb.org/en/stable/api/database/bulk-api.html#db-bulk-get, but in my case I have to use selector which is a hard limitation.

Upvotes: -1

Related Questions