j9dy
j9dy

Reputation: 2179

Spark - How to create a variable that is different for each executor context?

My Spark application launches several executors. I have several partitions that get spread over my executors.

When using map() on these partitions, I want to use a MongoDB connection (MongoDB Java Driver) and query more data from there, process this data and return it as the output of the map() function.

I want to create one connection per executor. Each partition should then access this executor-local variable and use it to query the data.

Establishing a connection for each partition is probably not a good idea. Broadcasting the connection won't work either because it is not serializable (I think?).

To sum it up:

Upvotes: 1

Views: 426

Answers (1)

Ross
Ross

Reputation: 18111

You should use the MongoConnector.

It will handle creating a collection and is backed by a cache that efficiently handles the shutdown of any MongoClients. It is serialisable so it can be a broadcast and it can take options, a readConfig or the Spark context to configure where to connect to.

MongoConnector uses the loan pattern to handle reference management of the underlying connection to MongoDB and allows access at the MongoClient, MongoDatabase or the MongoCollection level.

Upvotes: 1

Related Questions