Gojoe
Gojoe

Reputation: 115

Designing a Semantic Layer with Multiple Data Sources and Complex Transformation Rules using snowflake and dbt

I am working on a Data Warehouse (DWH) project that integrates data from three different sources, each providing four tables (12 tables in total). These data sources are unrelated except for some common fields such as Market_ID, Category_ID, Product_ID, Cost, and Sales.

Each data source has around 20 rules/transformations that need to be applied. Here are a couple of simple examples to illustrate:

Geography_Targeted = Market_country, Brand Material = Product_Core_Material

The rules are much more complex in reality. Additionally, a product can be compliant with one rule and non-compliant with another on the same day. The rules are independent of each other.

Here is an abstract example of the data:

Market_ID Category_ID Product_ID Date Geography Market_Country Material Product_Core
ABC XYZ 12345 1-1-2020 Japan JPN Steel;Aluminium Steel
ABC XYZ 12345 2-1-2020 Japan Steel;Aluminium Steel
ABC XYZ 12345 3-1-2020 Japan JPN Steel;Aluminium
ABC XYZ 12345 4-1-2020 Japan JPN Steel;Aluminium Steel
ABC XYZ 12345 5-1-2020 Japan JPN

The data can reach up to 15 million records per month.

My goal is to design an efficient semantic layer to utilize this data in Power BI without hitting refresh limits. The proposed solution is to create a table for each rule with the following structure: Market_ID, Category_ID, Product_ID, Cost, Sales, Rule_Compliance_Flag.

Is there a better approach to handle this scenario? Any suggestions on optimizing the semantic layer for Power BI to efficiently manage and utilize this data?

Thank you for your help!

Upvotes: 0

Views: 48

Answers (0)

Related Questions