dataviews
dataviews

Reputation: 3090

hadoop with mongodb and hadoop vs mongodb

I'm trying to understand key differences between mongoDB and Hadoop. I understand that mongoDB is a database, while Hadoop is an ecosystem that contains HDFS. Some similarities in the way that data is processed using either technology, while major differences as well.

I'm confused as to why someone would use mongoDB over the Hadoop cluster, mainly what advantages does mongoDB offer over Hadoop. Both perform parallel processing, both can be used with Spark for further data analytics, so what is the value add one over the other.

Now, if you were to combine both, why would you want to store data in mongoDB as well as HDFS? MongoDB has map/reduce, so why would you want to send data to hadoop for processing, and again both are compatible with Spark.

Upvotes: 0

Views: 1100

Answers (2)

Anuj Gupta
Anuj Gupta

Reputation: 76

Firstly, we should know what these two terms mean.

HADOOP Hadoop is an open-source tool for Big Data analytics developed by the Apache foundation. It is the most popularly used tool for both storing as well as analyzing Big Data. It uses a clustered architecture for the same. Hadoop has a vast ecosystem and this ecosystem comprises of some robust tools.

MongoDB MongoDB is an open-source, general-purpose, document-based, distributed NoSQL database built for storing Big Data. MongoDB has a very rich query language which results in high performance. MongoDB is a document-based database, which implies that it stores data in JSON-like format documents.

DIFFERENCES

enter image description here

Both these tools are good enough for harnessing Big Data. It depends on your requirements. For some projects, Hadoop would be a good option and some MongoDB fits well.

Hope this helps you to distinguish between the two.

Upvotes: 1

EnvyChan
EnvyChan

Reputation: 56

First lets look at what we're talking about

  • Hadoop - an ecosystem. Two main components are HDFS and MapReduce.
  • MongoDB - Document type NoSQL DataBase.

Lets compare them on two types of workloads

  1. High latency high throughput (Batch processing) - Dealing with the question of how to process and analyze large volumes of data. Processing will be made in a parallel and distributed way in order to finalize and retrieve results in the most efficient way possible. Hadoop is the best way to deal with such a problem, managing and processing data in a distributed and parallel way across several servers.

  2. Low Latency and low throughput (immediate access to data, real time results, a lot of users) - When dealing with the need to show immediate results in the quickest way possible, or make small parallel processing resulting in NRT results to several concurrent users a NoSQL database will be the best way to go.

A simple example in a stack would be to use Hadoop in order to process and analyze massive amounts of data, then store your end results in MongoDB in order for you to:

  1. Access them in the quickest way possible
  2. Reprocess them now that they are on a smaller scale

The bottom line is that you shouldn't look at Hadoop and MongoDB as competitors, since each one has his own best use case and approach to data, they compliment and complete each other in your work with data.

Hope this makes sense.

Upvotes: 2

Related Questions