user1460691
user1460691

Reputation: 71

How to share global Map values among RDDs in Spark?

I am trying to access a Map from RDDs than are on different compute nodes, but without success. The Map is like:

val map1 = Map("aa"->1,"bb->2,"cc->3,...)

All RDDs will have to check against it to see if the key is in the Map or not, so seems I have to make the Map itself global, the problem is that if the Map is stored as RDDs and spread across the different nodes, each node will only see a piece of the Map and the info will not be complete to check against the Map( an then replace the key with the corresponding value) E,g:

val matchs= Vecs.map(term=>term.map{case (a,b)=>(map1(a),b)})

Any idea about this? Thanks!

Upvotes: 0

Views: 441

Answers (1)

Justin Pihony
Justin Pihony

Reputation: 67085

It sounds like you simply want to use a broadcast variable:

val broadCastMap = sc.broadcast(map)
Vec.map(term=>term.map{case (a,b)=>(broadCastMap.value(a),b)})

Upvotes: 1

Related Questions