Reputation: 1585
I have to design a cost optimized solution which should be aws cloud native . The problem that i have to solve is i have 90 millions messages coming from data base. Every event is independent and no ordering is required in order to process that . I have to process every messages and do some operation and then put that into Elastic search service .
The solution for this that i have thought is below one
AWS API-->LAMBDA-->SNS-->SQS(1)-->LAMBDA-->ES
--->SQS(2)-->LAMBDA-->ES
Basically from SNS is used so that multiple SQS can consume at the same time .
While doing this i thought why cant we use S3 so that record can be persisted for ever and can be replicated into another region,Also we can invoke lambda function on every put event in S3 .
So my plan is if we use S3 then for 90 millions records we will be creating 90 millions files into s3 and then using cloud front we can read or even without cloud from we can read from s3 using lambda function .
API-->S3-->lambda--->ES
The throughput of S3 put is 3500/second/folder and out through out is 5000/sec/prefix . The cost of put request in s3 and sqs is almost same .
Can someone tell me what is wrong is using S3 based solution . I know using SQS looks very obvious here but what is the risk if we use S3 int this case ?
The through put out that i am looking for is 5k per second .
Even the cost wise SQS one looks costlier because i need to pay fro SNS +SQS both but if we use S3 only S3 put and lambda
Please suggest
Upvotes: 2
Views: 3453
Reputation: 8887
I wouldn't do either of those, and do this:
API --> SNS --> Lambda --> ES
--> Lambda --> ES
SNS to lambda will run as many lambdas as are necessary to handle the request load, up to the limits on your account, or the limits set on the lambda. The only reason to put SQS in there is for some added resiliency, but I'd probably just do that on the Lambda, as a dead letter queue.
Upvotes: 1