Reputation: 1405
Say I join two tables on their distkey, but I apply a function on the columns in the join condition:
select count(1)
from toto_calls
inner join toto_carriers
on substring(toto_calls.to_carrier, 1, 3) = substring(toto_carriers.carrier, 1, 3)
Will Redshift take advantage of the distkey and be able to collocate the join?
Upvotes: 0
Views: 186
Reputation: 1405
No. For Redshift to collocate the join and avoid broadcasting rows, you need the join to be on the exact columns that are distkeys. Not on the result of some operation on the column. I checked this with query plans:
explain (select count(1) from toto_calls inner join toto_carriers on to_carrier = carrier);
Gives:
XN Aggregate (cost=18514.87..18514.87 rows=1 width=0)
-> XN Hash Join DS_DIST_NONE (cost=0.12..17562.68 rows=380875 width=0)
Hash Cond: (("outer".carrier)::text = ("inner".to_carrier)::text)
-> XN Seq Scan on toto_carriers (cost=0.00..4231.94 rows=423194 width=10)
-> XN Hash (cost=0.10..0.10 rows=9 width=10)
-> XN Seq Scan on toto_calls (cost=0.00..0.10 rows=9 width=10)
Filter: (to_carrier IS NOT NULL)
DS_DIST_NONE, so no broadcasting, yay! While:
explain (
select count(1) from toto_calls
inner join toto_carriers on substring(toto_calls.to_carrier, 1, 3) = substring(toto_carriers.carrier, 1, 3)
);
Gives:
XN Aggregate (cost=2018250.37..2018250.37 rows=1 width=0)
-> XN Hash Join DS_BCAST_INNER (cost=0.12..2018197.47 rows=21160 width=0)
Hash Cond: ("substring"(("outer".carrier)::text, 1, 3) = "substring"(("inner".to_carrier)::text, 1, 3))
-> XN Seq Scan on toto_carriers (cost=0.00..4231.94 rows=423194 width=10)
-> XN Hash (cost=0.10..0.10 rows=10 width=10)
-> XN Seq Scan on toto_calls (cost=0.00..0.10 rows=10 width=10)
DS_BCAST_INNER, so broadcasting, booh...
Upvotes: 2