Reputation: 847
I am trying to provide broadcast hint to table which is smaller in size, but physical plan is still showing me SortMergeJoin.
spark.sql('select /*+ BROADCAST(pratik_test_temp.crosswalk2016) */ * from pratik_test_staging.crosswalk2016 t join pratik_test_temp.crosswalk2016 c on t.serial_id = c.serial_id').explain()
Note :
created_date
[partitioned column] instead of serial_id
as my joining condition, it is showing me BroadCast Join -spark.sql('select /*+ BROADCAST(pratik_test_temp.crosswalk2016) */ * from pratik_test_staging.crosswalk2016 t join pratik_test_temp.crosswalk2016 c on t.created_date = c.created_date').explain()
Why spark behavior is strange with AWS Glue Catalog as my metastore?
Upvotes: 1
Views: 2803
Reputation: 31490
In BROADCAST
hint we need to pass the alias name of the table
(as you have alias kept in your sql statement).
Try with /*+ BROADCAST(c) */*
instead of /*+ BROADCAST(pratik_test_temp.crosswalk2016) */ *
spark.sql('select /*+ BROADCAST(c) */ * from pratik_test_staging.crosswalk2016 t join pratik_test_temp.crosswalk2016 c on t.serial_id = c.serial_id').explain()
Upvotes: 7