Lambda architecture on AWS: choose database for batch layer

Question

We're building Lambda architecture on AWS stack. A lack of devops knowledge forces us to prefer AWS managed solution over custom deployments.

Our workflow:

[Batch layer] 
Kinesys Firehouse -> S3 -Glue-> EMR (Spark) -Glue-> S3 views -----+                                                   
                                                                  |===> Serving layer (ECS) => Users                                                
Kinesys -> EMR (Spark Streaming) -> DynamoDB/ElasticCache views --+
[Speed layer]

We have already using 3 datastores: ElasticCache, DynamoDB and S3 (queried with Athena). Bach layer produce from 500,000 up to 6,000,000 row each hour. Only last hour results should be queried by serving layer with low latency random reads.

Neither of our databases fits batch-insert & random-read requirements. DynamoDB not fit batch-insert - it's too expensive because of throughput required for batch inserts. Athena is MPP and moreover has limitation of 20 concurrent queries. ElasticCache is used by streaming layer, not sure if it's good idea to perform batch inserts there.

Should we introduce the fourth storage solution or stay with existing?

Considered options:

Persist batch output to DynamoDB and ElasticCache (part of data that is updated rarely and can be compressed/aggregated goes to DynamoDB; frequently updated data ~8GB/day goes to elasticCache).
Introduce another database (HBase on EMR over S3/ Amazon redshift?) as a solution
Use S3 Select over parquet to overcome Athena concurrent query limits. That will also reduce query latency. But have S3 Select any concurrent query limits? I can't find any related info.

The first option is bad because of batch insert to ElasticCache used by streaming. Also does it follow Lambda architecture - keeping batch and speed layer views in the same data stores?

The second solution is bad because of the fourth database storage, isn't it?

Lambda architecture on AWS: choose database for batch layer

Answers (1)

Related Questions