vop
vop

Reputation: 11

Merge join not working as documented Redshift

From AWS Documentation: Merge Join

Typically the fastest join, a merge join is used for inner joins and outer joins. The merge join is not used for full joins. This operator is used when joining tables where the join columns are both distribution keys and sort keys, and when less than 20 percent of the joining tables are unsorted. It reads two sorted tables in order and finds the matching rows. To view the percent of unsorted rows, query the SVV_TABLE_INFO system table.

The join I am testing:

select *  
FROM d2l.tbl1 r
JOIN d2l.tbl2 rc
ON r.sortkey1_field = rc.sortkey1_field 

SVV_TABLE_INFO details:

SVV_TABLE_INFO details

Execution plan:

Explan Plan

I have full vacuumed and analyzed both table before running explain.

Also per AWS documentation distyle all should be have the same as two objects having the same distkey. Despite that I have recreated tables with distkey on the given join field and it was still not giving merge on explain plan.

Can someone explain what is going on or at least what would need to be changed to make this a merge join?

Upvotes: 1

Views: 3834

Answers (1)

Tony Gibbs
Tony Gibbs

Reputation: 2489

A merge join requires that both tables are distributed with DIST STYLE KEY with the same key. The first column of the SORT KEY on both tables must also be the same as the DIST KEY. Merge joins also require updated statistics/vacuum.

The reason it's not working in your case is that both your tables are distributed with DISTSTYLE ALL.

Upvotes: 2

Related Questions