Ali Lordifar
Ali Lordifar

Reputation: 101

Streaming Data From different Sources to AWS S3

I have different data sources and I need to publish them to S3 in real-time. I also need to process and validate data before delivering them to S3 buckets. I know that AWS Kinesis Data Stream offers Real-time data streaming and I can process data using AWS lambda before sending them to S3. However, it is not clear for me that can we use AWS Glue Streaming instead of AWS Kinesis Data Stream and AWS Lambda? I have seen some documentations about using AWS Glue Streaming for processing real-time data on the fly and send them to S3. So, what is the real differences here? Is AWS Glue Streaming ETL a good choice for streaming and processing data in real-time and store them into S3?

Upvotes: 0

Views: 223

Answers (1)

omuthu
omuthu

Reputation: 6333

Kinesis data stream with lambda consumer will fit as long as the lambda execution environment limits is sufficient

  • 15 mins execution time
  • Memory config
  • Concurrency limits

When going with glue consumer, your glue jobs can run longer and also supports Apache spark for massive parallel processing

You can also use Kinesis firehose which has native integration to deliver data to S3, ElasticSearch etc..., which doesn't require any changes to data. You can also have a lambda to do minimal processing intercepting the data before delivering using firehose.

Upvotes: 1

Related Questions