N_A
N_A

Reputation: 101

What happens if we use broadcast in the larger table?

I wanted to know what will happen if we broadcast the larger table while joining it to smaller. Also, if we have two equally large tables, what will happen when we use broadcast join in that scenario?

Upvotes: 3

Views: 2407

Answers (1)

Sanket9394
Sanket9394

Reputation: 2091

There are few things to consider :

  1. Spark Upper Limit : Spark supports upto 8GB of broadcast table. If your broadcast object is more than that, it would fail.
  2. Driver and Executor Memory : Since the table will be copied in to the memory of driver and then to executors, As long as you have enough memory , it should be broadcasted successfully.
  3. Performance : If it is broadcasted, a portion of your memory will be reserved for that. So, whatever left will be used for further operations which might make it slow. (example if executor_memory is 8 gb, broadcasted variable is 6 gb)

So, from your question, behaviour of broadcast depends on what you broadcast, doesn't matter if the Joining table is large or small. Broadcast is an independent functionality. And Spark uses this functionality in Joins.

Upvotes: 4

Related Questions