Reputation: 406
I have SNS Topic
which triggers 50 Lambdas
in Multiple Accounts
Now each lambda produces some output in json
format.
I want to aggregate all those individual json into one list and then pass that into Another SNS Topic
whats the best to achieve to aggregate data
Upvotes: 1
Views: 1514
Reputation: 31
The scenario you describe does not really match with the architectural pattern you are choosing. If you know upfront you'll have to deal with state (aggregate is keeping track of the state) SNS & SQS is not the right solution, neither is Lambda.
What is not mentioned in the other posts is that you'll have to manage the fact that there is a possibility that one of your 50 processes could fail. You'll have to take that in account too. Handling all of these cases shouldn't be your focus since there are tools doing that for you.
I recommend you to take a look at AWS Kinesis: https://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html
Also, AWS Step Functions provides a solution: https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-parallel-state.html
Upvotes: 1
Reputation: 6130
I would suggest looking at DynamoDB for aggregating the information, if the data being stored lends itself to that.
The various components can drop their data in asynchronously, then the aggregator can perform a single query to pull in the whole result set.
Although it's described as a database, it can be viewed as a simple object store or lookup engine, so you do not really have to think about data keys, only a way to distinguish each contribution from the others.
So you might store under "lambda-id + timestamp", which ensures each record is distinct, and then you can just retrieve all records. Don't forget to have a way to retire records, so the system does not fill up !
Upvotes: 0
Reputation: 14905
There are a couple of architecture solutions you can use to solve this. There is probably not a "right one", it will depends on the volume of data, frequency of triggers and budget.
You will need some shared storage where your 50 lambdas functions can temporary store their results, and another component, most probably another lambda function in charge of the aggregation to produce the final result.
Depending on the volume of data to handle, I would first consider a shared Amazon S3 bucket where all your 50 functions can drop their piece of JSON, and the aggregation function could read and assemble all the pieces. Other services that can act as a shared storage are Amazon DynamoDB and Amazon Kinesis.
The difficulty will be to detect when all the pieces are available to start the final aggregation. If 50 is a fixed number, that will be easy, otherwise you will need to think about a mechanism to tell the aggregation function it can start to work...
Upvotes: 3