Reputation: 415
We are currently using MongoDB for one of our "large scale data" products. To give a brief idea we use Mongo to store a lot of social media data like tweets/posts/hashtags and so on. So the use case is social media analytics. So far the only trouble we are facing with MongoDB are in terms of full text search capability and aggregation performance.
The number of docs would be around 25 million and we are using this on a single instance. Also most of our analysis is on the entire set (we usually don't have many filters to reduce the analytical dataset). Recently we started looking at Elastic Search. Its a beautiful tool and searches are extremely fast. So one scenario we were considering was to use this as a search layer on top of Mongo.
But, after evaluation we see that ES also has great analytical capability especially in terms of aggregations. Our question is that is it a good idea to use ES as the ONLY datastore (as a replacement for Mongo). We see most of the traction for ES in terms of a search layer and not a analytical tool. Are there any drawbacks of using ES in an analytical capability. In short what are the things that Mongo does better than ES?
Upvotes: 2
Views: 1675
Reputation: 10859
In terms of features you should be well covered by Elasticsearch. Filters, queries, and (pipelined) aggregations do everything MongoDB does and a bit more.
I would mainly be careful with the resiliency both solutions provide: Elasticsearch is not a database by design and "bad" things can happen in certain situations; though they are well documented on the resiliency page. Using version 2.3 or even 5 (which is currently in alpha) the latest stable version provides a very stable base and all data-loss issues I've seen with them in real world applications (not lab scenarios) were due to bad configurations.
Disclaimer: I work for Elastic.
Upvotes: 5
Reputation: 79
Elasticsearch should be fine for your scenario. For hot data (complex analytics), I would use an analytic database like Exasol. For warm data, you could indeed use Elasticsearch - I would not use MongoDB at all. For cold data (e.g., the original ingestion data), Hadoop may be fine.
When you deal with large volumes on the ingestion side or in the data repository itself, Elasticsearch allows you to create indices per day or per medium or per whatever - queries may still work across partial indices. This "read-only" property for the majority of data in the repository reduces overall hardware cost in comparison to a database.
As for analytics, you may use Elasticsearch very nicely for aggregations and other types of aggregated statistics. When it comes to more complex analytics functions, go for a decent analytic database or maybe you will be able to handle it during ingestion in an Apache Spark pipeline.
Upvotes: -1