Lucian
Lucian

Reputation: 824

KSQLDB Pull Performance

I am aware that each partition in KSQLDB generates a RocksDbTable. Also KSQLDB repartitions so that the same keys are stored in the same partition.

But I can't find any answer regarding the query performance. How efficient is an KSQLDB pull? Does it scan the whole table? Does it query the key which has an index associated with it in RocksDb? You can disable table scan but what is the default behaviour?

Is it safe to assume since it has RocksDB which is a key/value store that it will lookup for the key without any intermediary ksqldb operation and without scanning?

Upvotes: 0

Views: 473

Answers (1)

Aniket Chopade
Aniket Chopade

Reputation: 841

RocksDB is built on LSM trees (and SSTables). It is a key-value data store.

Any LSM based database stores data in two levels

  1. Red-black Tree in RAM
  2. Sorted set Table in disk

For look ups in disks - It uses sparse index as shown below. SSTable, as the name indicates, is a sorted array of keys persisted on disk. It is evident in the picture below.

If look up the key "dollar" in the segment below.

Lookup steps-

  1. Find "dollar" in Red-black tree (or memtable), if it is not there then proceed ahead with Disk.
  2. At the disk: Binary search performed on a sparse index to find that the key "dollar" comes between "dog" and "downgrade"
  3. Scan from offset 17208 to 19504 in order to find the value. (This offset number maps to SSTable or a physical file on drive).
  4. Once we know the file number. All entries in SSTables are sorted. So again binary search is applied.

enter image description here

So, as you can see there is no scan.

For non-existent keys, it uses "bloom-filter" to deduce that keys do not exist so it will not scan all segments.

Upvotes: 1

Related Questions