Zifre
Zifre

Reputation: 27008

Database query time complexity

In modern databases, if I use an index to access a row, this will be O(1) complexity.

But if I do a query to select another column, will it be O(1) or O(n)?

Does the database have to iterate through all the rows?

Or does it build a sorted list for each column?

Upvotes: 36

Views: 35874

Answers (8)

YHTAN
YHTAN

Reputation: 686

Table without index, data save on non-order structure. When you want to search for some data, it would use the "Scan" to go check through all the data from start till end on the table.

Case 1: query table without index, query 1 record, SQL query plan step: "table scan" all data, O(N)

Case 2: query table without index, query many records, SQL query plan step: "table scan" all data, O(N)

Table with index, data will save on B-tree structure, which when you want to search for 1 data (on the indexed column), it would make use of the B-tree structure to find the data.

Case 3: query table with on indexed column, query 1 record, SQL query plan step: "index seek", O(LogN)

Case 4: query table with on indexed column, query many record,

2 possible, the SQL Query Optimizer will made use of the "index statistics" to compute and determine which action step is faster to be use.

  • (a) SQL query plan step: "index Scan", O(N)
  • (b) SQL query plan step: "index seek", O(R LogN) [R for number of records]

Upvotes: 0

Shady Mohammed
Shady Mohammed

Reputation: 39

B-trees do not yield O(logN), that is the complexity of a binary tree.

A B-Tree is organised such that it has an entire block per node, thus once a node is found a single I/O operation can read an entire block.

With the number of items per node = blocking factor (#records/block){bfr}, a B-Tree optimized search will yield O(log bfr÷2 +1 N) I/O operations instead of O(N) I/O operations seeking a record by key.

Upvotes: 3

Wim Coenen
Wim Coenen

Reputation: 66753

I don't know the answer, but keep in mind that big-O notation only gives you an indication of performance for data-set sizes which are arbitrarily large.

For example, the bottleneck for database performance is typically disk seeks. Therefore, performance is greatly increased if the working data-set can be kept in memory. Big-O notation won't tell you anything about such optimizations, because they are only relevant for finite data-sets.

Upvotes: 4

Paco
Paco

Reputation: 8381

There are different types of indexes, different execution plans and different implementations for different databases. Most of the code of relations database is in search-optimising algorithms. There is not a single answer to your question. You can use a tool to visualise the execution plan when you want to know how a query is going to be executed.

Upvotes: 0

David Schmitt
David Schmitt

Reputation: 59355

To answer your literal question, yes if there is no index on a column, the database engine will have to look at all rows.

In the more interesting case of selecting by multiple columns, both with and without index, the situation becomes more complex: If the Query Optimizer chooses to use the index, then it'll first select rows based on the index and then apply a filter with the remaining constraints. Thus reducing the second filtering operation from O(number of rows) to O(number of selected rows by index). The ratio between these two number is called selectivity and an important statistic when choosing which index to use.

Upvotes: 13

Tjofras
Tjofras

Reputation: 2096

Indexes are per column, so if you use a where clause on a un-indexed column it will do a so called tablescan which is O(n).

Upvotes: 4

Dana
Dana

Reputation: 32997

Actually, I think access based on an index will be O(log(n)), because you'll still be searching down through a B-Tree-esque organization to get to your record.

Upvotes: 40

Lancelot
Lancelot

Reputation: 2427

You have indexes. Clustered indexes are physically sorted on the disk, you can have only one per table. Unclustered indexes are logically sorted and you can have many of those (careful not to abuse it either, it might slow down write actions). If there is no index on your column then I believe it's the good old row by row method.

Upvotes: 0

Related Questions