Reputation: 22270
What's the worst case time complexity in a log-structured merge tree for a simple search query (like querying a single WHERE
clause)?
Is it O(log N)? O(N*Log N)? Something else?
How about for a multiple query, like searching for multiple WHERE
clauses in a key-value database?
The wikipedia page on LSM trees is currently lacking this info.
And I'm trying to make sense of the original paper.
Upvotes: 3
Views: 1620
Reputation: 193
I have been wondering the same.
If you have a series of trees, getting smaller by a constant factor each time, and you need to search them all for a single key, the cost seems to be O(log(N)^2).
Say the first (binary) tree takes log_2(N) branches to reach a node. The second might be half the size, and take (log_2(N) - 1) branches to find a node. The smallest tree will be some O(1) constant in size and there are roughly log_2(N) trees total. Summing the series gives O(log_2(N)^2).
However, I'm wondering if there is some more clever scheme where arbitrary single-key lookups, insertions or deletions have amortized cost O(log(N)), but haven't been able to find an answer (yet).
Upvotes: 1
Reputation: 8928
For a simple search indexed by a LSM tree, it is O(log n). This is because the biggest tree in the LSM tree is a B tree, which is O(log n), and the other trees are subsets of B trees or in the case of in memory trees, more efficient trees, which are no worse than O(log n). The number of trees is a constant, so it doesn't affect the order of the search time.
Upvotes: -1