Reputation: 6826

Performance difference in Couchbase's get by Key and select by index

As we are doing benchmark tests on our Couchbase DB, we tried to compare search for item by their id / key and search for items by a query that uses secondary index.

Following this article about indexing and performance in Couchbase we thought the performance of the two will be the same.

However, in our tests, we discovered that sometimes, the search by key/id was much faster then the search that uses the secondary index.

E.g. ~3MS to search using the index and ~0.3MS to search by the key.(this is a 10 times factor)

The point is that this difference is not consist. The search by key varies from 0.3MS to 15MS.

We are wondering if:

There should be better performance for search by key over search by secondary index?
There should be such time difference between key searches?

Upvotes: 2

Answers (3)

Benjamin Bryant

Reputation: 31

It is possible to achieve sub-millisecond secondary lookups at scale, but it requires some tuning of your query, index, and some possibly some of Couchbase' system parameters. Consider the following simple example:

Sample document in userBucket:

"user::000000000001" : { "email" : "[email protected]", "userId" : "000000000001" }

This query:

SELECT userId FROM userBucket WHERE email = "[email protected]" AND userId IS NOT NULL ;

...should be able to achieve sub-millisecond performance with a properly tuned secondary index:

CREATE INDEX idx01 ON userBucket(email, userId);

Since the index is covering the query completely there is no need for the Query engine to FETCH the document from the K/V store. However "SELECT * ..." will always cause the Query service to FETCH the document and thus will be slower than a simple k/v GET("user::000000000001").

For the best latencies, make sure to review your query plan (using EXPLAIN syntax) and make sure your query is not FETCHing. https://docs.couchbase.com/server/6.0/n1ql/n1ql-language-reference/explain.html

Upvotes: 1

EbenH

Reputation: 566

To add to @deniswrosa's answer, the secondary index will always be slower, because first the index must be traversed based on your query to find the document key, and then a key lookup is performed. Doing just the key lookup is faster if you already have the key. The amount of work to traverse the index can vary depending on how selective the index is, whether the entire index is in memory, etc. Memory-optimized indexes can ensure that the whole index is in memory, if you have enough memory to support that.

Of course even a simple key lookup can be slower if the document in question is not in the cache, and needs to be brought in to memory from storage.

Upvotes: 3

deniswsrosa

Reputation: 2460

The results you get are consistent with what I would expect. Couchbase works as a key-value store when you do any operation using the id. A key-value store is roughly a big distributed hashmap, and in this data structure, you can a very good performance on get/save/delete while using the id.

Whenever you store a new document, couchbase hash the key and assign a Virtual Bucket to it (something similar to a shard). When you need to get this document back, it uses the same algorithm to find out in which virtual bucket the document is located, as the SDK has the cluster map and knows exactly which node has which shards, your application will request the document directly to the node who owns it.

On the other hand, when you query the database, Couchbase has to make internally a map/reduce to find out where the document is located, that is why operations by id are faster.

About your questions about results from 0.3ms to 15ms, it is hard to tell without debugging your environment. However, there are a number of factors that could contribute to it. Ex: the document is cached/not cached, node is undersized, etc.

Upvotes: 3

Performance difference in Couchbase&#39;s get by Key and select by index

Answers (3)

Related Questions

Performance difference in Couchbase's get by Key and select by index