Reputation: 73
I have a question regarding the inner workings of the Spark driver for MongoDB.
Suppose you have a cluster and a sharded MongoDB on that cluster along with hadoop and spark. When I use the Spark driver to handle the data from MongoDB, does spark use the front-end of the database or does it utilize the fact that the database is sharded and access the data separately in each shard?
Thanks
Upvotes: 2
Views: 559
Reputation: 98
MongoDB and Hadoop clusters are logically separate, but data locality will improve performance: we won't have network operations if needed data on the same shard. In case when collection isn't sharded workers will have network operations(except workers on primary host).
Maybe you will find this useful: http://www.ikanow.com/how-well-does-mongodb-integrate-with-hadoop/
Upvotes: 2