Dean Hiller
Dean Hiller

Reputation: 20210

porting index logic and join logic from H2 database, but any good information on indexes?

So, we are about to port H2 code to a have a noSQL store instead of on the filesystem for our use in a large system(though with trillions of smaller indexes).

When looking at lucene and H2, at first glance it almost looks like they both use one b-tree if you index 4 columns(say A, B, C, D), instead of 4 b-trees. I am a bit confused there as that would mean, I would need to query on A or this would break down, correct? or am I mistaken and there is actually 4 b-trees, and when I do a join, that would mean there could be 8 b-trees I need to deal with or something.

Is there any good articles on how this works in detail? or can someone recommend some good book on this subject?

(I was an electrical engineer in school so never had that database programming class :( kinda regret that but shouldn't be too hard to catch up).

thanks, Dean

Upvotes: 2

Views: 109

Answers (1)

Branko Dimitrijevic
Branko Dimitrijevic

Reputation: 52157

All SQL DBMSes I know of have only one B-Tree per whole composite index. I'm guessing any other system that has a concept of "composite index" behaves the same.

In case of a composite index on {A, B, C, D}, this one B-Tree will allow you to search efficiently for...

  • A = ...
  • A = ... AND B = ...
  • A = ... AND B = ... AND C = ...
  • A = ... AND B = ... AND C = ... AND D = ...

...and the similar range searches.

It will be somewhat efficient for:

  • A = ... AND C = ...
  • A = ... AND D = ...
  • A = ... AND C = ... AND D = ...
  • A = ... AND B = ... AND D = ...

And will be inefficient for:

  • B = ...
  • B = ... AND C = ...
  • B = ... AND D = ...
  • B = ... AND C = ... AND D = ...
  • C = ...
  • D = ...
  • etc...

In other words searching on the leading edge of the index is efficient (though some DBMSes, such as Oracle, can use "skip scan" for non-leading-edge searches).


On the other hand, having separate (non-composite) indexes on {A}, {B}, {C} and {D}, would lead to four B-Trees and a different set of performance characteristics.

For a good introduction on how database indexes work, take a look at the Anatomy of an SQL Index.

Upvotes: 3

Related Questions