Shivaprasad
Shivaprasad

Reputation: 167

Mongdb hadoop integration for faster data processing

Mongodb can be integrated with hadoop to do the faster data processing but during this course of integration(MongoDB->Hadoop), data gets transferred from mongodb to hadoop. Question here is,

1.Cost of data transfer from mongodb to hadoop is it not costlier than actual data processing in mongodb ?

2.Is data transfer (MongoDB->Hadoop) is it a one time activity ?, if yes, how is later updates to mongodb will be reflected in hadoop.

Upvotes: 3

Views: 357

Answers (1)

d0x
d0x

Reputation: 11571

To meet the "Single Source of truth" principle you should try to not 'copy' the data and you shouldn't keep it redundant in HDFS.

To avoid that the Mongo-Hadoop connector allows you to query Mongodb directly instead of the local HDFS. This has of course the drawback that your production database gets more load. The alternative it to query against your mongodb bson dumps.

To your questions:

to 1.: If the Hadoop nodes are "near" to the mongo nodes it isn't to much overhead. When you use Hadoops map reduce it enables you to use more features like HIVE, PIG, ... which you can't use on Mongos Map Reduce. And it enables you to scale the "calculation power" on demaned without touching your database (all hadoop nodes will be used. On MongoDB you need to take care about the shard key).

to 2.: You do it over and over again. (Expected you are using capped collection and you configured an Stream to process it. But i guess you aren't using those).

You should read about the Lambda Architecture in the Big Data Book http://www.manning.com/marz/ . They define really nice why you combine smth. like MongoDB and Hadoop.

Upvotes: 1

Related Questions