Spark filter pushdown with multiple values in subquery

Question

I have a small table adm with one column x that contains only 10 rows. Now I want to filter another table big that is partitioned by y with the values from adm using partition pruning.

While here

select * from big b 
where b.y = ( select max(a.x) from adm a)

the partition filter pushdown works, but unfortunately this:

select * from big b
where b.y IN (select a.x from adm a )

results in a broadcast join between a and b

How can the subquery be pushed down as a partition filter even when I use IN

Spark filter pushdown with multiple values in subquery

Answers (1)

Related Questions