user1836155
user1836155

Reputation: 918

Join types of Netezza

I am having trouble getting an official list of join types in Netezza/PureData.

According to a powerpoint presentation I read, there are 5 types:

  1. Hash Join: Requires equi-conditions, identical types. Fastest
  2. Exist Join: A type of Hash Join?
  3. Expression Join: Also called Nested Loop joins?
  4. Merge Joins: Are they merge sort joins? Some articles mention that are like Hash Joins but for floats
  5. Cross Product Join: Slow?

And as you can see, I have a lot of gaps of knowledge. Unfortunately, I have yet to find a comprehensive documentation that covers this topic. Any input on this will be appreciated.

Upvotes: 1

Views: 9740

Answers (1)

ScottMcG
ScottMcG

Reputation: 3887

Here's my swag at an answer. I think the best way to think about it is to consider how the join is expressed in SQL, and then consider the join algorithm that is used by the database to resolve them under the covers.


Equi-join

Joins of this form, where the join criteria is an equality comparison, and each term is a simple column reference (i.e. not an expression).

SELECT ...
FROM tableA a
   JOIN tableB b
   ON a.col1 = b.col

Algorithms

Netezza will most often resolve Equi-joins using a Hash Join approach where the hashed value of the join columns can be compared to verify equality.

In certain circumstances, Netezza may use a Merge Sort Join approach in cases where the data being joined is already sorted (e.g. using Materialized Views) making this normally expensive approach more attractive, or when the data types being compared are such that a hash comparison is not reliable (e.g. floating point data types).


Expression Joins

Joins of this form, where the join criteria is anything but an equality comparison.

SELECT ...
FROM tableA a
   JOIN tableB b
   ON a.col1 < b.col

Algorithms

Since a simple equality evaluation isn't sufficient, Netezza handles these cases with a Nested Loop approach. Each row in tableA is evaluated individually against each row in tableB. It is for this reason that expression join can never be a co-located join in Netezza. The Nested Loop approach is therefore quite expensive.


Cross Joins

Joins of this form, where there the join criteria does not include any columns from at least one of the table.

SELECT ...
FROM tableA a
   JOIN tableB b
   ON a.col1 = 0

or SELECT ... FROM tableA a, tableB b

Algorithms

Netezza will take cartesion product approach to resolving these, which can be quite expensive, and also cannot be a co-located join as every row in every data slices must be joined to every row in every other data slice.


Implict IN/EXISTS Joins

Statements such as these are not explicitly joins in SQL, but may be processed as joins by the database.

SELECT ...
FROM tableA a
WHERE col1 IN
   (
      SELECT col1
      FROM tableB
   )

or

SELECT ...
FROM tableA a
WHERE NOT EXISTS
   (
      SELECT *
      FROM tableB b
      WHERE a.col1 = b.col1
   )

Algorithms

Netezza will approach these statements with an Exists Join approach, which is fundamentally the same as the Hash Join approach, but since no joined data is required in the result set, it only has to tally existence in the process.

Upvotes: 6

Related Questions