Reputation: 57
A friend's company is working on a Data architecture which to us, it seems to be rather convoluted and having several scalability and cost problems.
If possible, I would like to have your opinion on the old and proposed architectures (or alternatives), to discuss their pros and cons and potentially finding unforeseen problems/limitations.
Current Architecture - Azure Stack
Ingestion Layer
Processing Layer
Loading Layer
Presentation Layer
Pros of this Approach
Cons of this Approach
Proposed Architecture - Azure w/ Delta Lake
The alternative architecture relies on the fact that Azure Databricks is already used within the ETL process and attempts to maximize its usage to provide horizontal scalability and Serverless resources.
Ingestion Layer
Processing and Loading Layers
Presentation Layer
Pros of this Approach
Cons of this Approach
Upvotes: 0
Views: 542
Reputation: 2344
I think almost you covered. Based on my experience giving few suggestion . You can consider this approach if your business model allows.
Ingestion Layer:
Structure Layer :
In this Layer , we need to maintain the proper structured of the data. Take an example , Some time , we may required to maintain or convert from one format to another format. consider , one column has string type in source , but business need to convert string to decimal . those kind process should be taken care in this layer .
we can handle this transformation through Azure Databricks.
Serve Layer :
This is layer we will do all transformation for reporting layer . Example Team-1 and team-2 should be joined in this layer .
we can handle this transformation through Azure Databricks
Presentation Layer :
Using Azure Synapse Analytics (Serverless) to designate access and querying capabilities directly on Azure Data Lake Gen2 which is then exposed to Power BI and Excel for Governance purposes
Or we can connect serve layer through Databricks cluster JDBC connection. So that, handling all access control very easy if connect reporting layer from databricks.
Upvotes: 1