Reputation: 33
As i know, broadcast is useful to get local copy of a variable. And the size of the variable must fit in worker's memory.
In my case, However, I want to get local copy of large variable which is not fit in worker's memory.
How can i broadcast this large variable not using broadcast function in Spark?
Upvotes: 3
Views: 2266
Reputation: 29237
Question :
In my case, However, I want to get local copy of large variable which is not fit in worker's memory.
How can i broadcast this large variable not using broadcast function in Spark?
AFAIK this is not possible (which wont fit in to workers memory...)
either by sc.broadcast(..)
or functions.broadcast(hint)
please be aware that there is a memory limit of 2GB(TorrentBroadcast ) see SPARK-6235 - Address various 2G limits
you can ingest that data (which you want to broadcast) in to hadoop/hbase(or any no sql) or may be memcached and then you can look up.
Upvotes: 0
Reputation: 1832
large variable which is not fit in worker's memory
Like Ram mentioned above, if it doesn't fit in worker's memory, there is no way you can use it, even if you can broadcast it.
If you're trying to do lookup with large dataset, you can make a connection pool to a database at each worker node. If you have a model, you can save the model to each worker node and do a file read during foreachPartition
. Depending on your use case, there maybe other solutions.
Upvotes: 1