Reputation: 115
I am working on a Data Warehouse (DWH) project that integrates data from three different sources, each providing four tables (12 tables in total). These data sources are unrelated except for some common fields such as Market_ID, Category_ID, Product_ID, Cost, and Sales.
Each data source has around 20 rules/transformations that need to be applied. Here are a couple of simple examples to illustrate:
Geography_Targeted = Market_country, Brand Material = Product_Core_Material
The rules are much more complex in reality. Additionally, a product can be compliant with one rule and non-compliant with another on the same day. The rules are independent of each other.
Here is an abstract example of the data:
Market_ID | Category_ID | Product_ID | Date | Geography | Market_Country | Material | Product_Core |
---|---|---|---|---|---|---|---|
ABC | XYZ | 12345 | 1-1-2020 | Japan | JPN | Steel;Aluminium | Steel |
ABC | XYZ | 12345 | 2-1-2020 | Japan | Steel;Aluminium | Steel | |
ABC | XYZ | 12345 | 3-1-2020 | Japan | JPN | Steel;Aluminium | |
ABC | XYZ | 12345 | 4-1-2020 | Japan | JPN | Steel;Aluminium | Steel |
ABC | XYZ | 12345 | 5-1-2020 | Japan | JPN |
The data can reach up to 15 million records per month.
My goal is to design an efficient semantic layer to utilize this data in Power BI without hitting refresh limits. The proposed solution is to create a table for each rule with the following structure: Market_ID, Category_ID, Product_ID, Cost, Sales, Rule_Compliance_Flag.
Is there a better approach to handle this scenario? Any suggestions on optimizing the semantic layer for Power BI to efficiently manage and utilize this data?
Thank you for your help!
Upvotes: 0
Views: 48