Reputation: 1217
I have multiple AWS kinesis data streams/firehose with structured data in CSV format. I need to perform analytics on that data with kinesis data analytics. But how can I merge multiple streams into one? Because Kinesis data analytics gets data only from one stream. Data streams can exist in different regions.
Problem: How to merge Kinesis data streams into one for Kinesis data analytics?
Upvotes: 2
Views: 2136
Reputation: 91
I recently implemented a solution capable of joining multiple sets of streaming data, and I faced the same issue you said in your question.
Indeed, a KDA In-application takes only one stream as input data source; so this limitation makes the schema standardization of the data flowing into KDA necessary when you are dealing with multiple sets of streams. To work around these issues, a python snippet code can be used inside of lambda to flatten and standardize any event by converting its entire payload to a JSON-encoded string. Then, this lambda send the flattened events to a Kinesis Data Stream. The image below illustrates this process:
Note that after this stage both JSON events have the same schema and no nested fields. Yet, all information is preserved. In addition, the ssn field is placed on the header to be used as join key later on.
I wrote a detailed explanation of this solution here: https://medium.com/@guilhermeepassos/joining-and-enriching-multiple-sets-of-streaming-data-with-kinesis-data-analytics-24b4088b5846
I hope this may help!!!
Upvotes: 0
Reputation: 21
It is a late answer, but to update it for completeness
You can also do it with Kinesis Data Analytics for Apache flink. https://docs.aws.amazon.com/kinesisanalytics/latest/java/how-it-works.html. It is a managed Apache Flink service from AWS, if you dont mind writing a bit of code in Java/Python language.
You can use Studio notebook, if you are exploring streaming data i.e in development phase. https://docs.aws.amazon.com/kinesisanalytics/latest/java/how-notebook.html
Disclaimer: I work for the Amazon Kinesis team
Upvotes: 2
Reputation: 700
I don't know if there are any "off the shelf" products from AWS you can use to do this but it's pretty simple if you don't mind writing a little bit of code.
The resulting kinesis stream should have the merged data you are looking for and can use it to pump into analytics.
Upvotes: 1