What is a Set with regard to databases?

Question

I have just started to learn about SQL sets yet in the outset no definition of what is a set is, is given, only thing mentioned are UNIONS and INTERSECTS.

Example SQL:

SELECT * FROM sales2005
UNION
SELECT * FROM sales2006;

philipxy · Accepted Answer

The everyday notion of a set is a collection of things or values where each appears at most once and there is no order to them.

We can ask whether a certain thing/value is IN a particular set, ie is an element of it, ie is a member of it. Among relevant operations is the UNION of two sets, which gives the set whose members are in one or both of those sets. INTERSECTION gives the set of values that are in both of two sets. The DIFFERENCE of two sets is the set of values that are members of one but not of the other. "The" "empty" set is the set with no members.

In a relational context we sometimes think of a table as a set of rows. They must share the same number of columns, and sometimes also the same set of column names or list of column names. (Then there's only one empty table.) Other times we think of a table as such a set of rows plus a required shared number of columns, set of column names, or list of column names. (Then, since the number and/or names of columns matter, two tables with no rows can be different tables.) But even such a table sloppily gets called a set.

In an SQL context a table is a collection of rows plus a (non-empty) list of column names. But the collections are not sets, they are bags. A bag is a collection of values where the same value can appear as a member more than once. As long as we only deal with bags where any value appears as a member only once, we can think of them as sets. (SQL tables differ from relational tables in other ways.) Sometimes they just sloppily get called sets anyway.

The bagginess of SQL is why it has both UNION (DISTINCT) and UNION ALL. UNION ALL returns the table whose bag members are the members of the argument bags, each appearing the total number of the times it does among the argument bags. But UNION (DISTINCT) returns the table whose bag is like that but with only one appearance for each member. Its bag is like the set of rows that appear as members of the argument bags.

PS Reasoning about tables holding sets is straightforward. Each table holds the rows that make some fill-in-the-(named-)blanks statement into a true statement. (The blanks are named by columns and they are filled by corresponding values from the rows.) Suppose a table T with columns T1, T2, ... holds rows making "...T1...T2..." true. Then for two tables R & S R UNION S holds the tuples that make "...R1...R2... OR ...S1...S2..." true and R JOIN S holds the rows that make "...R1...R2... AND ...S1...S2..." true. And so on for other table and logic operators. But there are no simple rules like this about bags. This presence of duplicate rows makes SQL queries more difficult to reason about than queries with relational tables. UNION (DISTINCT) and SELECT DISTINCT are used to get rid of duplicates.

What is a Set with regard to databases?

Answers (1)

Related Questions