Redis ZRANGEBYLEX command complexity

Question

According documentation section for ZRANGEBYLEX command, there is following information. If store keys in ordered set with zero score, later keys can be retrieved with lexicographical order. And ZRANGEBYLEX operation complexity will be O(log(N)+M), where N is total elements count and M is result set size. Documentation has some information about string comparation, but tells nothing about structure, in which elements will be stored.

But after some experiments and reading source code, it's probably what ZRANGEBYLEX operation has a linear time search, when every element in ziplist will be matched against request. If so, complexity will be more larger than described above - about O(N), because every element in ziplist will be scanned.

After debugging with gdb, it's clean that ZRANGEBYLEX command is implemented in genericZrangebylexCommand function. Control flow continues at eptr = zzlFirstInLexRange(zl,&range);, so major work for element retrieving will be performed at zzlFirstInLexRange function. All namings and following control flow consider that ziplist structure is used, and all comparation with input operands are done sequentially element by element.
Inspecting memory with analysis after inserting well-known keys in redis store, it seems that ZSET elements are really stored in ziplist - byte-per-byte comparation with gauge confirm it.

So question - how can documentation be wrong and propagate logarithmic complexity where linear one appears? Or maybe ZRANGEBYLEX command works slightly different? Thanks in advance.

Itamar Haber · Accepted Answer

how can documentation be wrong and propagate logarithmic complexity where linear one appears?

The documentation has been wrong on more than a few occasions, but it is an ongoing open source effort that you can contribute to via the repository (https://github.com/antirez/redis-doc).

Or maybe ZRANGEBYLEX command works slightly different?

Your conclusion is correct in the sense that Sorted Set search operations, whether lexicographical or not, exhibit linear time complexity when Ziplists are used for encoding them.

However.

Ziplists are an optimization that prefers CPU to memory, meaning it is meant for use on small sets (i.e. low N values). It is controlled via configuration (see the zset-max-ziplist-entries and zset-max-ziplist-value directives), and once the data grows above the specified thresholds the ziplist encoding is converted to a skip list.

Because ziplists are small (little Ns), their complexity can be assumed to be constant, i.e. O(1). On the other hand, due to their nature, skip lists exhibit logarithmic search time. IMO that means that the documentation's integrity remains intact, as it provides the worst case complexity.

Redis ZRANGEBYLEX command complexity

Answers (1)

Related Questions