sqldatabaserelational-databaserdbmsdatabase-theory

Reputation: 3470

Is there ALWAYS a "base table" in any database query?

Ok this is slightly theoretical so it would be great if an unbiased database enthusiast gave an opinion.

For the sake of argument let's agree that there is such a concept as a "base table" w.r.t. to a query, where one table is driving the majority of information of the result set. Imagine a query where there are three relations - TableA, TableB, and TableC

Let's say TableA has cardinality of 1 million records and TableC has 500 records and TableC has 10,000.

Let's say the query is like so -

SELECT A.Col1
     , A.Col2
     , A.Col3
     , A.Col4
     , A.Col5
FROM TableA A
 LEFT JOIN TableB B ON B.ID = A.TableBID
 LEFT JOIN TableC C ON C.ID = A.TableCID

Ok, clearly TableA is the base relation above. It is the biggest table, it is driving the result set by being joined "from", and visually the columns are even on the "left side" of the result set. (The left side thing actually was a criterion to my colleague).

Now, let's assume that TableA has 1 million rows again, TableB is a "junction" or "bridge" table and has like 500,000 rows and TableC has 1,000,000 rows. So assume the query is just an outer join to get all columns in TableA and TableC where a relationship exists like below...

SELECT A.*
     , C.*
FROM TableC C
 FULL OUTER JOIN TableB B ON C.ID = B.TableAID
 FULL OUTER JOIN TableA A ON A.ID = B.TableCID

Ok so given the last query, can anyone tell me what the "base relation" is? I don't think there is one, but was hoping for another database person's opinion.

Upvotes: 0

Answers (4)

philipxy

Reputation: 15157

A base table is a given named table-valued variable--a database table. That's it. In a query expression its name is a leaf expression denoting its value. "Given table variable" would be more descriptive. A query can use literal notation for a table. It would be reasonable for a given named table-valued constant to also be called "base". It's nothing about some kind of "main" table.

The relational model is founded on a table holding the rows that make a true proposition (statement) from its (characteristic) predicate (statement template parameterized by column names). We give base table rows & get query expression rows.

A query expression that is a base table name comes with a predicate given by the designer.

/* (person, liked) rows where [liker] likes [liked] */
/* (person, liked) rows where Likes(liker, liked) */
SELECT * FROM Likes

A query expression that is a table literal has a certain predicate in terms of columns being equal to values.

/* (person) rows where
    person = 'Bob'
*/
SELECT * FROM (VALUES ('Bob')) dummy (person)

Otherwise a query expression has a predicate built from its constituent table expression predicates according to its relation operator.

Every algebra operator corresponds to a certain logic operator.
NATURAL JOIN & AND
RESTRICTtheta & ANDtheta
UNION & OR
MINUS & AND NOT
PROJECTall butC & EXISTS C
etc

/* (person) rows where
    (FOR SOME liked, Likes(person, liked))
OR  person = 'Bob'
*/
    SELECT liker AS person
    FROM Likes
UNION
    VALUES ('Bob')

/* (person, liked) rows where
FOR SOME [values for] l1.*, l2.*,
        person = l1.liker AND liked = l2.liked
    AND Likes(l1.liker, l1.liked)
    AND Likes(l2.liker, l2.liked)
    AND l1.liked = l2.liker
    AND person = 'Bob'
    AND NOT Likes(l1.liked, 'Ed')
*/
Likes l1 INNER JOIN Likes l2
ON l1.liked = l2.liker
WHERE l1.liker = 'Bob'
AND NOT (l1.liked, 'Ed') IN (SELECT * FROM Likes)

There's no difference to how a base, literal or operator-call query expression is used in determining a containing query expression's predicate.

Is there any rule of thumb to construct SQL query from a human-readable description?
Relational algebra - recode column values

Upvotes: 3

nvogel

Reputation: 25534

The concept of a "driving" table is really an assumption about how the DBMS is expected to execute a query internally. A rule-based query optimizer, in the absence of any index-related preferences, may treat the ordering of tables and joins in a query as significant when it comes to choosing the execution plan. Under a cost-based optimizer, there is no significance to the order of tables and joins so nothing about the structure of the query itself will tell you which table gets read first or in what order the join conditions get evaluated.

When conceptualizing a query it may help to have a mental image of one table being the starting point for the query but I think the answer to the question here must be no. Logically speaking there is no such thing as a driving table.

Upvotes: 3

Gordon Linoff

Reputation: 1271051

The term "base table" has a definition and it has nothing to do with what you describe. A "base table" is pretty much just a "table". That is, it is not a view, it is not a table valued function, it is not the result of a query. It is what gets stored in the database as an explicit table.

What you are seem to be grasping for seems more related to optimization strategies. I have used similar terminology -- in the context of optimization -- to describe the "driving table" being accessed by the optimizer. The purpose of this is to distinguish between different execution plans.

Consider the query:

from t1 join t2 using (col)

There are multiple different execution plans. Here are some methods and what might be considered the "driving table" (if any) for them:

for each row in t1
    for each row in t2
         compare col
-->  t1 is the "driving table"

for each row in t2
    for each row in t1
        compare col
--> t2 is the "driving table"

for each row in t1
    look up t2 value using index on t2(col)
--> t1 is the "driving table"

sort t1 by col
sort t2 by col
compare the rows in the two sorted sets
--> no "driving table"

hash t1 by col
hash t2 by col
compare the hash maps
--> no "driving table"

In other words, the "driving" table has little to do with the query structure. It is based on the optimization strategies used for the query. That said, left joins and right joins limit the optimization paths. So, in a nested loop or index-lookup situation, the "first" (or "last") table would be the driving table.

Upvotes: 4

GMB

Reputation: 222682

Let me suggest a perspective where the base table is the first one in the FROM clause (ie not a JOINed table). In the case where a statement can be equally written with either one table or another as base table, we would say that there are two (or more) base tables.

In your first query, the base table is TableA. If you invert TableA and TableC in the query, you are not guaranteed to get the same results, because of the LEFT JOIN.

In the second query, as you are using FULL JOINs, all 3 tables could be inverted without changing the result, so this is indeed a use case of a query where all tables are base tables.

Upvotes: 2

Is there ALWAYS a &quot;base table&quot; in any database query?

Answers (4)

Related Questions

Is there ALWAYS a "base table" in any database query?