Reputation: 8580
I have looking around the net for a decent tutorial or an article that simply formally define what every keyword in SQL (for beginners at least) means, and how does it behave combined with other SQL keywords. I want to know what is being executed before what, how is it done and how is it formally defined to behave.
All I got is tons of examples that I waste time on rather than just know what a keyword really does precisely and a little about how it gets done would be nice to know as well.
Why is it not trivial that people want to know that? And why does every book and every site I opened that is about SQL is teaching by examples rather then formally giving the definitions of every keyword just like in core classes methods implementations of Java for example.
I just don't get it; can you explain why?
Upvotes: 0
Views: 1312
Reputation: 57023
There's an old joke that SQL stands for "Scarcely Qualifies as a Language". A SQL query is more like a specification than a program: you describe what you want -- and in common with natural language is are more than one way way to say the same thing -- and the system is free to implement it in any way it sees fit as long as it fulfils the requirements**.
I can thoroughly recommend the book, SQL and Relational Theory: How to Write Accurate SQL Code By C. J. Date. Although it probably doesn't satisfy your requirements precisely it does explain SQL's theoretical base and its deviations from it.
One thing to note is that the SQL standard had evolved over the years and nothing is ever deprecated (known as "the Shackles of Compatibility"), resulting in (IMO) an unintuitive, user-unfriendly language. SQL's rigid SELECT..FROM..WHERE
, which is logically executed FROM..WHERE..SELECT
, meaning projection in SQL is verbose and 'expensive', is just one example of SQL's lack of flexibility.
i really find it hard to believe that by now no one really understand enough of SQL to be completely able to difine exactly what every key word does, and in what order.
Joe Celko writes about this kind of stuff a lot. Here are some exact phrases to google for:
"Here is how a SELECT works in SQL ... at least in theory."
"Here is how OUTER JOINs work in SQL-92"
"The correct syntax for a searched update statement is"
"CASE is not a switch; it is an **expression** in SQL"
That said I've seen SQL Server bugs closed as 'won't fixed' because although the results are wrong the optimization is spot on!
Upvotes: 4
Reputation: 753705
There are at least a couple of issues:
You can find the BNF (Backus-Naur Format) grammars for some versions of the standard available here. These are heavily hyperlinked HTML. However, the standard is not just the SQL grammar; there are lots (and lots, and lots) of rules about what is allowed where and when (and not usually much explanation of why) in the rest of the standard. The standard is almost impossibly opaque at times.
Here's a tame example from ISO/IEC 9075-2:2003 (E) — that's SQL/Foundation for SQL-2003:
10.7
<collate clause>
Function
Specify a default collation.
Format
<collate clause> ::= COLLATE <collation name>
Syntax Rules
1) Let C be the
<collation name>
contained in the<collate clause>
. The schema identified by the explicit or implicit qualifier of the<collation name>
shall include the descriptor of C.Access Rules
1) Case:
a) If
<collate clause>
is contained, without an intervening<SQL routine spec>
that specifies SQL SECURITY INVOKER, in an<SQL schema statement>
, then the applicable privileges of the<authorization identifier>
that owns the containing schema shall include USAGE on C.b) Otherwise, the current privileges shall include USAGE on C.
NOTE 228 — “applicable privileges” and “current privileges” are defined in Subclause 12.3, “
<privileges>
”.General Rules
None.
Conformance Rules
1) Without Feature F690, “Collation support”, conforming SQL language shall not contain a
<collate clause>
.
A less tame example is that <cast specification>
has 16 pages of gobbledygook describing it. This is from about 2/3 of the way through. It is 'General Rule' number 16 (out of 20):
16) If TD is the datetime data type TIME WITH TIME ZONE, then let TSP be the
<time precision>
of TD.Case:
a) If SD is character string, then SV is replaced by:
TRIM ( BOTH ' ' FROM VE )
Case:
i) If the rules for
<literal>
or for<unquoted time string>
in Subclause 5.3, “<literal>
”, can be applied to SV to determine a valid value of the data type TD, then let TV be that value.ii) If the rules for
<literal>
or for<unquoted time string>
in Subclause 5.3, “<literal>
”, can be applied to SV to determine a valid value of the data type TIME(TSP) WITHOUT TIME ZONE, then let TV1 be that value and let TV be the value of:CAST ( TV1 AS TIME(TSP) WITH TIME ZONE )
iii) If a
<datetime value>
does not conform to the natural rules for dates or times according to the Gregorian calendar, then an exception condition is raised: data exception — invalid datetime format.iv) Otherwise, an exception condition is raised: data exception — invalid character value for cast.
b) If SD is TIME WITH TIME ZONE, then TV is SV, with implementation-defined rounding or truncation if necessary.
c) If SD is TIME WITHOUT TIME ZONE, then the UTC component of TV is SV – STZD, computed modulo 24 hours, with implementation-defined rounding or truncation if necessary, and the time zone component of TV is STZD.
d) If SD is TIMESTAMP WITH TIME ZONE, then the UTC component of TV is the hour, minute, and second
<primary datetime field>
s of SV, with implementation-defined rounding or truncation if necessary, and the time zone component of TV is the time zone displacement of SV.e) If SD is TIMESTAMP WITHOUT TIME ZONE, then TV is:
CAST ( CAST ( SV AS TIMESTAMP(TSP) WITH TIME ZONE ) AS TIME(TSP) WITH TIME ZONE )
The indentation is a little better in the standard, and you'd have a little more context for some of the names (such as TV, SV, SD, etc.), but the language used is really that turgid.
Upvotes: 2