Reputation: 20046

How does a DBMS implement their own sorting algorithm? Or do they?

When a SQL is translated to C by parser such as YACC or BISON, does that piece of translated C code contains the the sorting algorithm mathematics? I do not understand how the sorting is implemented in a DBMS (such as MySQL or Microsoft SQL Server) - is the algorithm part of the grammar parser? Or, does the algorithm applies to a resulted group of data only after it get fetched from an SQL query, but not directly apply to the computer memory? Or is the sorting algorithm an ISO standard and that all DBMS are required to use the same algorithm?

I did my researched and googling but find no clear answer. Without unneedingly reading a book on database internal, could someone explain the concept clearly?

Upvotes: 4

Answers (3)

pojo-guy

Reputation: 979

As in so many things, it depends.

What the ISO standards define is that when a sort order is requested, it is honored in particular ways. The mechanics of meeting that standard are up to the implementation. With that said, sorting has been a heavily studied branch of computing for nearly half a century, and there are a small number of algorithms that are known to work well, plus minor variations that amount to fine tuning.

LEXX, YACC, and BISON don't do much besides pull out the intent of he supplied code. You can identify nouns, predicates and verbs in the supplied code, but the output doesn't actually do anything until it is passed to an interpreter of some sort.

In the RDBMS, the interpreter hiding under the parser and lexer takes those nouns, predicates and verbs and computes an idealized access path to the data, taking into account the optimizations (proprietary or not) of the platform. The access path is executed as a list of verbs.

However, the interpreter does not have to be an RBMS. It might be a tool for managing metadata, in which case the result might be a graphical image of entity relationships (as an example).

Most databases use several different sorting algorithms depending on what they are sorting, and in what phase of the information lifecycle they are applying the sort.

When creating an ordered index from bulk data, they may use a tree sort or a heap sort.

When selecting data, the first choice is to choose an access path that allows traversal of an index that naturally returns the data in the order you requested (i.e. avoid sorting).

If the dataset must be sorted after retrieval, and it is sufficiently small to fit into memory, they will typically use some flavor of QuickSort.

If the dataset must be sorted after retrieval, and it is too large to fit into memory, they may create a temporary table and use either heap sort or tree sort.

I hope this helps.

Upvotes: 1

Niels Keurentjes

Reputation: 41958

The sorting algorithm is most certainly not a part of the grammar parser, it's technically an 'implementation detail'. It's a rather important one though, as it can fundamentally impact performance of complex queries. The term 'implementation detail' however refers to that it's up to the DBMS vendor to decide what to do and how to do it.

It could even be partially delegated to the query optimizer, as the common sorting algorithms like heapsort, mergesort, quicksort et al all have different 'best case scenarios'. Some perform notably better on 'mostly sorted data', and others are extremely slow on 'extremely unsorted data'. As indexes could contain hints on that a very smart DBMS could even pick a different sorting algorithm based on the data at hand, see this Wikipedia writeup for a comparison. To my knowledge none of the current vendors do that though.

So in the end, what sorting algorithms are used when is just a black box from the programmer's perspective. All you (should) care about is that the output is sorted correctly.

Upvotes: 2

Mureinik

Reputation: 311468

The SQL standard does not include any specifications on how sorting should be done. When you issue a query with order by, it's the database's responsibility to return a result in the specified order, but each database is free to implement this however it sees fit.

Upvotes: 4

How does a DBMS implement their own sorting algorithm? Or do they?

Answers (3)

Related Questions