Reputation: 3470
Ok this is slightly theoretical so it would be great if an unbiased database enthusiast gave an opinion.
For the sake of argument let's agree that there is such a concept as a "base table" w.r.t. to a query, where one table is driving the majority of information of the result set. Imagine a query where there are three relations - TableA, TableB, and TableC
Let's say TableA has cardinality of 1 million records and TableC has 500 records and TableC has 10,000.
Let's say the query is like so -
SELECT A.Col1
, A.Col2
, A.Col3
, A.Col4
, A.Col5
FROM TableA A
LEFT JOIN TableB B ON B.ID = A.TableBID
LEFT JOIN TableC C ON C.ID = A.TableCID
Ok, clearly TableA is the base relation above. It is the biggest table, it is driving the result set by being joined "from", and visually the columns are even on the "left side" of the result set. (The left side thing actually was a criterion to my colleague).
Now, let's assume that TableA has 1 million rows again, TableB is a "junction" or "bridge" table and has like 500,000 rows and TableC has 1,000,000 rows. So assume the query is just an outer join to get all columns in TableA and TableC where a relationship exists like below...
SELECT A.*
, C.*
FROM TableC C
FULL OUTER JOIN TableB B ON C.ID = B.TableAID
FULL OUTER JOIN TableA A ON A.ID = B.TableCID
Ok so given the last query, can anyone tell me what the "base relation" is? I don't think there is one, but was hoping for another database person's opinion.
Upvotes: 0
Views: 1949
Reputation: 15157
A base table is a given named table-valued variable--a database table. That's it. In a query expression its name is a leaf expression denoting its value. "Given table variable" would be more descriptive. A query can use literal notation for a table. It would be reasonable for a given named table-valued constant to also be called "base". It's nothing about some kind of "main" table.
The relational model is founded on a table holding the rows that make a true proposition (statement) from its (characteristic) predicate (statement template parameterized by column names). We give base table rows & get query expression rows.
A query expression that is a base table name comes with a predicate given by the designer.
/* (person, liked) rows where [liker] likes [liked] */ /* (person, liked) rows where Likes(liker, liked) */ SELECT * FROM Likes
A query expression that is a table literal has a certain predicate in terms of columns being equal to values.
/* (person) rows where
person = 'Bob'
*/
SELECT * FROM (VALUES ('Bob')) dummy (person)
Otherwise a query expression has a predicate built from its constituent table expression predicates according to its relation operator.
- Every algebra operator corresponds to a certain logic operator.
NATURAL JOIN
&AND
RESTRICT
theta
&AND
theta
UNION
&OR
MINUS
&AND NOT
PROJECT
all but
C
&EXISTS C
etc
/* (person) rows where (FOR SOME liked, Likes(person, liked)) OR person = 'Bob' */ SELECT liker AS person FROM Likes UNION VALUES ('Bob')
/* (person, liked) rows where FOR SOME [values for] l1.*, l2.*, person = l1.liker AND liked = l2.liked AND Likes(l1.liker, l1.liked) AND Likes(l2.liker, l2.liked) AND l1.liked = l2.liker AND person = 'Bob' AND NOT Likes(l1.liked, 'Ed') */ Likes l1 INNER JOIN Likes l2 ON l1.liked = l2.liker WHERE l1.liker = 'Bob' AND NOT (l1.liked, 'Ed') IN (SELECT * FROM Likes)
There's no difference to how a base, literal or operator-call query expression is used in determining a containing query expression's predicate.
Is there any rule of thumb to construct SQL query from a human-readable description?
Relational algebra - recode column values
Upvotes: 3
Reputation: 25534
The concept of a "driving" table is really an assumption about how the DBMS is expected to execute a query internally. A rule-based query optimizer, in the absence of any index-related preferences, may treat the ordering of tables and joins in a query as significant when it comes to choosing the execution plan. Under a cost-based optimizer, there is no significance to the order of tables and joins so nothing about the structure of the query itself will tell you which table gets read first or in what order the join conditions get evaluated.
When conceptualizing a query it may help to have a mental image of one table being the starting point for the query but I think the answer to the question here must be no. Logically speaking there is no such thing as a driving table.
Upvotes: 3
Reputation: 1271051
The term "base table" has a definition and it has nothing to do with what you describe. A "base table" is pretty much just a "table". That is, it is not a view, it is not a table valued function, it is not the result of a query. It is what gets stored in the database as an explicit table.
What you are seem to be grasping for seems more related to optimization strategies. I have used similar terminology -- in the context of optimization -- to describe the "driving table" being accessed by the optimizer. The purpose of this is to distinguish between different execution plans.
Consider the query:
from t1 join t2 using (col)
There are multiple different execution plans. Here are some methods and what might be considered the "driving table" (if any) for them:
for each row in t1
for each row in t2
compare col
--> t1 is the "driving table"
for each row in t2
for each row in t1
compare col
--> t2 is the "driving table"
for each row in t1
look up t2 value using index on t2(col)
--> t1 is the "driving table"
sort t1 by col
sort t2 by col
compare the rows in the two sorted sets
--> no "driving table"
hash t1 by col
hash t2 by col
compare the hash maps
--> no "driving table"
In other words, the "driving" table has little to do with the query structure. It is based on the optimization strategies used for the query. That said, left join
s and right join
s limit the optimization paths. So, in a nested loop or index-lookup situation, the "first" (or "last") table would be the driving table.
Upvotes: 4
Reputation: 222682
Let me suggest a perspective where the base table is the first one in the FROM
clause (ie not a JOIN
ed table). In the case where a statement can be equally written with either one table or another as base table, we would say that there are two (or more) base tables.
In your first query, the base table is TableA
. If you invert TableA
and TableC
in the query, you are not guaranteed to get the same results, because of the LEFT JOIN
.
In the second query, as you are using FULL JOIN
s, all 3 tables could be inverted without changing the result, so this is indeed a use case of a query where all tables are base tables.
Upvotes: 2