Reputation: 10876
We can easily save data between different AWS Services for ex. Kinesis to DynamoDB; or AWS IoT to Redshift etc.
But what is best strategy to save streaming data to suppose MongoDB ( which does NOT have AWS PaaS ; Atlas is there but it has no integrations with other AWS Services )
I can see some third party solutions are there; but what is best strategy to implement on AWS itself...Is execution of lambda function for each insert (batching) the only option ?
Upvotes: 4
Views: 1000
Reputation: 1195
The solution depends mostly on your use case. How fast do you need to insert the data into your MongoDB?
if you need a near real time solution, then Kinesis and Lambdas is you best option (assuming you don't want to invest in 3rd party products). If you can afford a delay and do batching, then you can save the kinesis stream into S3 and then use AWS Glue to process and load your data into the database.
What you need to think is mostly what do you need to do with the data.
If you are collecting sensor data, where you only care about aggregations (e.g. clicks in a UI), then it is better if you store the raw data into s3 and then execute a data pipeline (using AWS Glue for example) to store the aggregated data into MongoDB. S3 will be faster and cheaper for those types of data.
If you are using the stream to pass business entities (e.g. documents that provide value on their own), then a near real time solution using AWS lambda will be a better choice.
Without knowing the exact use case, I would propose to store in your database only the data that provide value (e.g. reports on aggregated data) and use S3 with a lifecycle policy for the raw "sensor" data.
Upvotes: 1
Reputation: 647
I am assuming that you are using Kinesis Firehose. If that's the case, what you can do is:
From Firehose write to S3 every 5 mins.
Firehose will create a new file on S3 every 5 mins.
Trigger a Lambda function to read the new file on S3.
Write the data of the new file to MongoDB.
If you are using Kinesis (not firehose), you can simply write a Kinesis consumer which will read data from the Kinesis and write directly yo MongoDB.
FYI, There is DocumentDB with MongoDB like API, you can use that as AWS Hosted MongoDB
Upvotes: 4
Reputation: 31
You can invoke lambda function on each FireHose invocation. And this lambda can insert into mongodb hosted on EC2. You can batch operations so as to reduce number of lambda invocations ( and in return reduce cost )
Upvotes: 3