user1676389
user1676389

Reputation: 73

Mongodb with Spark

I have a question regarding the inner workings of the Spark driver for MongoDB.

Suppose you have a cluster and a sharded MongoDB on that cluster along with hadoop and spark. When I use the Spark driver to handle the data from MongoDB, does spark use the front-end of the database or does it utilize the fact that the database is sharded and access the data separately in each shard?

Thanks

Upvotes: 2

Views: 559

Answers (1)

Sergey Lihoman
Sergey Lihoman

Reputation: 98

MongoDB and Hadoop clusters are logically separate, but data locality will improve performance: we won't have network operations if needed data on the same shard. In case when collection isn't sharded workers will have network operations(except workers on primary host).

Maybe you will find this useful: http://www.ikanow.com/how-well-does-mongodb-integrate-with-hadoop/

Upvotes: 2

Related Questions