HGSF
HGSF

Reputation: 57

Data Architecture - Full Azure Stack vs Integrated Delta Lake

A friend's company is working on a Data architecture which to us, it seems to be rather convoluted and having several scalability and cost problems.

If possible, I would like to have your opinion on the old and proposed architectures (or alternatives), to discuss their pros and cons and potentially finding unforeseen problems/limitations.

Current Architecture - Azure Stack

Ingestion Layer

Processing Layer

Loading Layer

Presentation Layer

Pros of this Approach

Cons of this Approach

Proposed Architecture - Azure w/ Delta Lake

The alternative architecture relies on the fact that Azure Databricks is already used within the ETL process and attempts to maximize its usage to provide horizontal scalability and Serverless resources.

Ingestion Layer

Processing and Loading Layers

Presentation Layer

Pros of this Approach

Cons of this Approach

Upvotes: 0

Views: 542

Answers (1)

I think almost you covered. Based on my experience giving few suggestion . You can consider this approach if your business model allows.

enter image description here

Ingestion Layer:

  • Each team(business unit) should have different container for storage data. Reason : we can maintain access level in team level.
  • Since any distribution environment, always ELT process is recommended than ETL , we can use Azure data factory as ingestion tool to build a data lake. Reason : Databricks is for computing purpose , no point in use for ingestion purpose.
  • Each container should have 3 different folder in same container.
  • In first layer , Stage - which hold incremental loads on daily basis from source .(Based on the frequency).
  • So every time stage data will be appended in raw layer data .Finally Raw layer which will contain exact snap shot of source .
  • we should maintain some curated folder , sometimes we may required to handle some secure data . that time we can isolate secure data from other data.

Structure Layer :

  • In this Layer , we need to maintain the proper structured of the data. Take an example , Some time , we may required to maintain or convert from one format to another format. consider , one column has string type in source , but business need to convert string to decimal . those kind process should be taken care in this layer .

  • we can handle this transformation through Azure Databricks.

Serve Layer :

  • This is layer we will do all transformation for reporting layer . Example Team-1 and team-2 should be joined in this layer .

  • we can handle this transformation through Azure Databricks

Presentation Layer :

  • Using Azure Synapse Analytics (Serverless) to designate access and querying capabilities directly on Azure Data Lake Gen2 which is then exposed to Power BI and Excel for Governance purposes

  • Or we can connect serve layer through Databricks cluster JDBC connection. So that, handling all access control very easy if connect reporting layer from databricks.

Upvotes: 1

Related Questions