Reputation: 1065
I'm reading Kimball's data-warehouse book at the moment and at the same time, I'm looking at designing a data warehouse for the following example:
1: Client can buy products which would generate an invoice which then would generate a inventory row. Note: The inventory row is a accumulating table, i.e. if they order Product A with a quantity of 5, and then another 5, there would be 2 invoice lines, and one inventory row of 10 units.
So based on the scenario described, the following design would be appropriate: 1. Client dimension 2. Product dimension 3. Date dimension 3. Invoice fact 4. Inventory fact
Now, let's say I want to find out answers to questions like 1. What clients currently have zero inventory but have generated invoices in the last 3 months . There will be other similar questions like these as well. 2. What clients have product A but don't have product B
Based on Kimballs book, it seems like there would need be some sort of joining between the two fact tables, like a subset?
This would seem rather inefficient, especially the sheer size of records between the 2 fact tables. Is this the right approach?
Upvotes: 1
Views: 107
Reputation: 19184
What clients currently have zero inventory but have generated invoices in the last 3 months.
SELECT C.*
FROM DimClient C
WHERE EXISTS (
SELECT * FROM FactInvoice I
WHERE C.ClientKey = I.ClientKey
AND InvoiceDate >= DATEADD(m,GetDate(),-3)
)
AND NOT EXISTS (
SELECT * FROM FactInventory V
WHERE InventoryLevel > 0
AND V.ClientKey = C.ClientKey
AND (Maybe a predicate to limit to current inventory only)
)
What clients have product A but don't have product B
SELECT C.*
FROM DimClient C
WHERE EXISTS (
SELECT * FROM FactInvoice I
WHERE C.ClientKey = I.ClientKey
AND I.ProductID IN (1,2,3)
)
AND NOT EXISTS (
SELECT * FROM FactInvoice I
WHERE C.ClientKey = I.ClientKey
AND I.ProductID NOT IN (4,5,6)
)
Are these queries inefficient? No more inefficient than the same query run against the source system
In fact your source system is probably going to need more tables, more joins, and will probably have less indexes
One of the basis for star schemas is that they are optimised for analytics.
Upvotes: 1
Reputation: 191
According to Kimball that seems to be the right approach to building drilling-across. He mentions that for this to work correctly it should be set up to use conformed dimensions as defined here and explained here. I'm not able to speak to how efficient/inefficient that would be in this case though.
Upvotes: 0