Reputation: 11
Presumably I could have found a typo in Spark version 3.1.1. I am using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.11)
scala> spark.conf.get("spark.sql.autoBroadcastJoinThreshold")
res0: String = 10485760b
But should possibly be: 104857600.
Therefore:
scala> spark.conf.set("spark.sql.autoBroadcastJoinThreshold", 104857600)
When you deploy with "10485760b", Spark cannot detect that one of the joined DataFrames is small (10 MB by default). The threshold for automatic broadcast join detection could be disabled. I hope my comment helps someone?
Upvotes: 1
Views: 1909
Reputation: 18495
This is not a typo but the correct value.
According to the documentation on Spark configuration, autoBroadcastJoinThreshold
has a default value of 10MB and is defined as
"Configures the maximum size in bytes for a table that will be broadcast to all worker nodes when performing a join."
Your proposed value of 104857600
would result in 104857600 / 1024 / 1024 = 100MB
which can potentially cause harm to the health of the applications performance.
In addition, at the beginning of the documentation it explains what the "b" stand for:
Upvotes: 2