Igl3
Igl3

Reputation: 5108

Composite key querying in couchbase 4.0

I got a view like this:

function (doc, meta) {
  if(doc.type){
    var id = doc.id ? doc.id: "";
    var company = doc.company ? doc.company: "";
    var store = doc.store ? doc.store: "";

    emit([doc.type, id, company, store]);
  }
}

And documents which all contain a type and a combination of the other 3 fields, depending on the type. I want to query generically via this view with the following function:

def find_by_type_pageing_by_id_company_store(self, format_function=None, page=None, rows=None, recent=None, type=None, id="", company="", store="", include_docs=True):

    if not type:
        logger.error("No Type Provided in find by type query")
        raise exceptions.InvalidQueryParams("No Type Provided in find by type query")

    view = VIEW_BY_TYPE_VIN_COMPANY_STORE

    cb = self.get_cb_bucket()

    query = Query()

    # 'recent' and 'rows' are equivalent and will be unified to 'limit' here
    if recent and rows:
        raise exceptions.InvalidQueryParams(detail="Query may not contain both 'recent' and 'rows'")
    limit = rows or recent

    if limit:
        try:
            rows_per_page = int(limit)
        except ValueError:
            raise exceptions.InvalidQueryParams(detail="Query params 'recent' and 'rows' have to be integers")

        if rows_per_page > settings.PAGINATION_MAX_ROWS_LIMIT:
            raise exceptions.InvalidQueryParams(detail="Query params 'recent' and 'rows' may not exceed %s. "
                                                "Use the additional param 'page=2', 'page=3', etc. to access "
                                                "more objects" % settings.PAGINATION_MAX_ROWS_LIMIT)
        try:
            page = 1 if page is None else int(page)
        except ValueError:
            raise exceptions.InvalidQueryParams(detail="Query param 'page' has to be an integer")

        skip = rows_per_page * (page - 1)

        query.limit = rows_per_page
        query.skip = skip

    query.mapkey_range = [
        [type, id, company, workshop],
        [type, id + query.STRING_RANGE_END, company + query.STRING_RANGE_END, store + query.STRING_RANGE_END]
    ]

    rows = cb.query(view['doc'], view['view'], include_docs=include_docs, query=query)

    if format_function is None:
        format_function = self.format_function_default

    return_array = format_function(rows)
    return return_array

It works flawlessly when only querying for a certain type, or a type and an id range.

But if I e.g. want to have all docs of a certain type belonging to a company, disregarding id and store, also docs of other companies are delivered.

I tried by:

query.mapkey_range = [
    ["Vehicle", "", "abc", ""]
    ["Vehicle", q.STRING_RANGE_END, "abc", q.STRING_RANGE_END]
]

I know, somehow the order of the values in the composite key matters, thats why the query for an id range probably is succesful.

But I could not find any detailed explanation how the order matters and how to handle this use case.

Any idea or hint how to cope with this? Thank you in advance.

Upvotes: 0

Views: 367

Answers (2)

David Ostrovsky
David Ostrovsky

Reputation: 2481

You have two options for querying your documents by a variable number/order of fields:

  1. Use a multidimentional view (aka. spatial view), which lets you omit parts of the compound key in the query. Here is an example of using such a view: http://developer.couchbase.com/documentation/server/4.0/views/sv-example2.html
  2. Use N1QL, which lets you actually query on any number of fields dynamically. Make sure you add indexes for the fields you intend to query, and use the EXPLAIN statement to check that your queries execute as you expect them to. Here is how you use N1QL in Python: http://developer.couchbase.com/documentation/server/4.0/sdks/python-2.0/n1ql-queries.html

As you've already discovered, you cannot use a regular view, because you can only query it by the exact order of fields in your compound key.

Upvotes: 1

Simon Baslé
Simon Baslé

Reputation: 28301

with compound keys, the order in emit determines the internal "sorting" of the index. When using range query, this order is used.

In your case:

  • index contains all Vehicles
  • all the Vehicles are then sorted by id
  • for each similar id, Vehicles are sorted by company
  • for each similar id and company, Vehicles are then sorted by store

Let's take an example of 4 vehicles. Here is what the index would look like:

Vehicle,a,ACME,store100
Vehicle,c,StackOverflow,store1001
Vehicle,d,ACME,store100
Vehicle,e,StackOverflow,store999

Here is what happens with a range query:

  • The view engine finds the first row >= to the startKey from your range
  • It then finds the last one that is <= to the endKey of your range
  • It returns every row in between in the array

You can see how, depending on the ids, this can lead to seemingly bad results: for [["Vehicle", "", "ACME", ""], ["Vehicle", RANGE_END, "ACME", RANGE_END]] here is what happens:

  • row 1 (a) is identified as the lowest matching the startKey
  • row 4 (e) doesn't match the endKey, because "Vehicle,e,StackOverflow,store999" is greater than "Vehicle,RANGE_END,ACME,RANGE_END" due to the third component
  • row 3 (d) is the upper bound: Vehicle <= Vehicle, d <= RANGE_END, ACME <= ACME, store100 <= RANGE_END
  • hence row 1-3 are returned, including row 2 from "StackOverflow"

TL/DR: Ordering in the emit matters, you cannot query with sparse "jokers" in the left side of the compound key.

Change the map function to emit(doc.type, doc.company, doc.store, id) (most generic to least generic attribute) and it should work fine after you rework your query accordingly.

Here is a link from the doc explaining compound keys and ranges with dates: Partial Selection With Compound Keys

Upvotes: 1

Related Questions