Reputation: 496
recently I started to use DynamoDB to store events with structure like this:
{start_date: '2016-04-01 15:00:00', end_date: '2016-04-01 15:30:00', from_id: 320, to_id: 360, type: 'yourtype', duration: 1800}
But when I started to analyze it I faced with the fact that DynamoDB has no aggregations, has read/write limits, response size limits etc. Then I installed a plugin to index data to ES. As a result I see that I do not need to use DynamoDB anymore. So my question is when do you definitely need to have NoSQL (in my case DynamoDB) instance along with Elasticsearch? Will it down ES performance when you are storing there not only indexes, but full documents? (yes I know ES is just an index, but anyway, in some cases such approaches could me more cost effective than having MySQL cluster)
Upvotes: 3
Views: 913
Reputation: 6671
The reason you would write data to DynamoDB and then have it automatically indexed in Elasticsearch using DynamoDB Streams is because DynamoDB, or MySQL for that matter, is considered a reliable data store. Elasticsearch is an index and generally speaking isn't considered an appropriate place to store data that you really can't afford to lose.
DynamoDB by itself has issues with storing time series event data and aggregating is impossible as you have stated. However, you can use DynamoDB Streams in conjunction with AWS Lambda and a separate DynamoDB table to materialize views for aggregations depending on what you are trying to compute. Depending on your use case and required flexibility this may be something to consider.
Using Elasticsearch as the only destination for thing such as logs is generally considered acceptable if you are willing to accept the possibility of data loss. If the records you are wanting to store and analyze are really too valuable to lose you really should store them somewhere else and have Elasticsearch be the copy that you query. Elasticsearch allows for very flexible aggregations so it is an excellent tool for this type of use case.
As a total alternative you can use AWS Kinesis Firehose to ingest the events and persistently store them in S3. You can then use an S3 Event to trigger an AWS Lambda function to send the data to Elasticsearch where you can aggregate it. This is an affordable solution with the only major downside being the 60 second delay that Firehose imposes. With this approach if you lose data in your Elasticsearch cluster it is still possible to reload it from the files stored in S3.
Upvotes: 4