Reputation: 23
I am using CouchDB 3.1.1 to perform Mango queries against a database containing a large number of documents. A very common requirement in my application is to perform queries on a very specific and dynamic set of documents. From what I understand at this moment, these are the only choices I have on how to confront my problem:
Of the ways I can accomplish the second choice:
The second choice is what I would prefer to use since making multiple POST requests would incur overhead. Unfortunately using "$or" seems to get in the way of the query engine making use of the "_id" index.
Thus, choice #1 returns with a speedy 2 ms per transaction but the results are not sorted (requiring my application to do the sorting). Choice #2, given an array of 2 _ids, regardless of the $or syntax, takes over 3 seconds to render.
What is the most efficient way to use a CouchDB Mango query index against a specific set of documents?
Fast Example: Results using a single _id
{
"selector": {
"_id": "184094"
},
"fields": [
"_id"
]
}
documents examined: 26,312
results returned: 1
execution time: 2 ms
Slow Example: Results using $or of key / value pairs
{
"selector": {
"$or": [
{
"_id": "184094"
},
{
"_id": "157533"
}
]
},
"fields": [
"_id"
]
}
documents examined: 26,312
results returned: 2
execution time: 2,454 ms
Slow Example: Results using $or array of values
{
"selector": {
"_id": {
"$or": [
"184094",
"157533"
]
}
},
"fields": [
"_id"
]
}
documents examined: 26,312
results returned: 2
execution time: 2,522 ms
Slow Example: Results using $in (which is illegal but still returns results)
{
"selector": {
"_id": {
"$in": [
"184094",
"157533"
]
}
},
"fields": [
"_id"
]
}
documents examined: 26,312
results returned: 2
execution time: 2,618 ms
Index: The registered index for _id
{
"_id": "_design/508b5b51e6085c2f96444b82aced1e5dfec986b2",
"_rev": "1-f951eb482f9a521752adfdb6718a6a59",
"language": "query",
"views": {
"foo-index": {
"map": {
"fields": {
"_id": "asc"
},
"partial_filter_selector": {}
},
"reduce": "_count",
"options": {
"def": {
"fields": [
"_id"
]
}}}}}
Explain: An 'explain' summary done to one of the slow queries. Note that the registered index was used.
{
"dbname": "dnp_person_comment",
"index": {
"ddoc": "_design/508b5b51e6085c2f96444b82aced1e5dfec986b2",
"name": "foo-index",
"type": "json",
"partitioned": false,
"def": {
"fields": [
{
"_id": "asc"
}
]
}
},
"partitioned": false,
"selector": {
"$or": [
{
"_id": {
"$eq": "184094"
}
},
{
"_id": {
"$eq": "157533"
}
}
]
},
"opts": {
"use_index": [],
"bookmark": "nil",
"limit": 25,
"skip": 0,
"sort": {},
"fields": [
"_id"
],
"partition": "",
"r": [
49
],
"conflicts": false,
"stale": false,
"update": true,
"stable": false,
"execution_stats": false
},
"limit": 25,
"skip": 0,
"fields": [
"_id"
],
"mrargs": {
"include_docs": true,
"view_type": "map",
"reduce": false,
"partition": null,
"start_key": [],
"end_key": [
"<MAX>"
],
"direction": "fwd",
"stable": false,
"update": true,
"conflicts": "undefined"
}
}
Upvotes: 2
Views: 3619
Reputation: 1105
This should help to understand why and when $or
may be slow.
https://github.com/apache/couchdb/issues/1852
The find lookup looks at the $or
fields and if they match -- uses the corresponding field index if such exists, otherwise it uses the default _all_docs
index which means it iterates through all the documents of the database to find the docs you're looking for which is quite inefficient thing to do. Since you query IDs the default index is what you want, and it will go through all the docs looking for a match.
I still tho wonder if there is any good way to fetch docs by selector with IDs specified in it. I have been looking at the views, but you can't use selector on them
In your particular case, you can use bulk get https://docs.couchdb.org/en/stable/api/database/bulk-api.html#db-bulk-get, but in my case I have to use selector which is a hard limitation.
Upvotes: -1