Reputation: 918
I am having trouble getting an official list of join types in Netezza/PureData.
According to a powerpoint presentation I read, there are 5 types:
And as you can see, I have a lot of gaps of knowledge. Unfortunately, I have yet to find a comprehensive documentation that covers this topic. Any input on this will be appreciated.
Upvotes: 1
Views: 9740
Reputation: 3887
Here's my swag at an answer. I think the best way to think about it is to consider how the join is expressed in SQL, and then consider the join algorithm that is used by the database to resolve them under the covers.
Equi-join
Joins of this form, where the join criteria is an equality comparison, and each term is a simple column reference (i.e. not an expression).
SELECT ...
FROM tableA a
JOIN tableB b
ON a.col1 = b.col
Algorithms
Netezza will most often resolve Equi-joins using a Hash Join approach where the hashed value of the join columns can be compared to verify equality.
In certain circumstances, Netezza may use a Merge Sort Join approach in cases where the data being joined is already sorted (e.g. using Materialized Views) making this normally expensive approach more attractive, or when the data types being compared are such that a hash comparison is not reliable (e.g. floating point data types).
Expression Joins
Joins of this form, where the join criteria is anything but an equality comparison.
SELECT ...
FROM tableA a
JOIN tableB b
ON a.col1 < b.col
Algorithms
Since a simple equality evaluation isn't sufficient, Netezza handles these cases with a Nested Loop approach. Each row in tableA is evaluated individually against each row in tableB. It is for this reason that expression join can never be a co-located join in Netezza. The Nested Loop approach is therefore quite expensive.
Cross Joins
Joins of this form, where there the join criteria does not include any columns from at least one of the table.
SELECT ...
FROM tableA a
JOIN tableB b
ON a.col1 = 0
or SELECT ... FROM tableA a, tableB b
Algorithms
Netezza will take cartesion product approach to resolving these, which can be quite expensive, and also cannot be a co-located join as every row in every data slices must be joined to every row in every other data slice.
Implict IN/EXISTS Joins
Statements such as these are not explicitly joins in SQL, but may be processed as joins by the database.
SELECT ...
FROM tableA a
WHERE col1 IN
(
SELECT col1
FROM tableB
)
or
SELECT ...
FROM tableA a
WHERE NOT EXISTS
(
SELECT *
FROM tableB b
WHERE a.col1 = b.col1
)
Algorithms
Netezza will approach these statements with an Exists Join approach, which is fundamentally the same as the Hash Join approach, but since no joined data is required in the result set, it only has to tally existence in the process.
Upvotes: 6