Reputation: 61
I am currently studying databases, and I have a question. The professor told us to create 2 different databases, and then move all the data to a star schema data model.
Here is the diagram for the first database, I have already filled the tables with data.
This is the diagram for the second database, also with data
This is my star schema model
The problem I am facing is i that i do not know how to start doing the mapping when adding my origin OLE DB and destination OLE DB.
I have already searched through the web, but I only find examples where they have to move just one database to the star schema data model.
Upvotes: 1
Views: 793
Reputation: 61201
The task you have is to consolidate two transactional/OLTP systems into a data warehouse. The reason you find only examples of moving/mapping one system into a data warehouse is that you simply repeat the process for each additional source that feeds into the DW.
In your example, you are integrating two sales systems to produce a unified sales business process report.
My approach would be to copy all the tables as is into your DW and put them into a staging schema (stage_d1, stage_d2). This way, you have all the data locally and it's likely consistent with the start of your data extract i.e. as of 9:30 this morning.
Now that you have the data locally, you need to transform and enrich the data to populate your dimensions and then your fact table.
Let's analyze Dim_Customer
. This table is probably a little light in terms of providing value but the methodology is what you should focus on. System 1 supplies a first and last name and a city, state and zipcode. System 2 gives us a Company name, contact name, city, state, postal code and phone.
Given the usage of postal code and zip code, that would have me wondering whether we're dealing with international addresses versus a US centric (zip code) data. I'd also notice that we don't have an actual address line for this data so the point is, analyse your data so you know that you're modeling something that solves the problem (report on sales across all systems).
The next question I'd wonder about is how we populate the Customer dimension. If a Mario Almaguer had a purchase in both system 1 and system 2, are they the "same" person? Does it matter for this business process? If we sold to a person in TX and that name also exists in ME, does it matter if the name is in there twice?
I'll assume we only care about unique customer names. If it's a bad assumption, we go back and model it differently.
In my source, I'll write a query.
SELECT DISTINCT CONCAT(C.FirstName, ' ', C.LastName) AS CustomerName FROM stage_d1.Customer AS C;
Run that, see that it returns the data I want. I'll then use an OLE DB Source
in an SSIS data flow and use the third drop down option of Source query.
If I run the package, we'll get all the unique customer names from the source system. But we only want the names we don't have so that means we need to use something to check our reference table for existing matches. That's the Lookup Component
The source for the Lookup will be the DW's Dim_Customer
table. You'll match based on CustomerName. The lookup component will tell us whether an incoming row matched and we can get two output streams: Match and no-match. We're only interested in the no-match path because that's new data. Andy Leonard has an excellent Stairway to Integration Services and in particular, we're talking about an Incremental Load.
From the Lookup, we'll drag the no-match branch to an OLE DB Destination where we point at Dim_Customer table.
You run that and Dim_Customer is populated. Run it again, and no new rows should be added as we're looking for new data only.
Now we need to solve getting the second staged customer data integrated. Fortunately, it's the same steps except this time our query is easier.
SELECT DISTINCT C.ContactName AS CustomerName FROM stage_d2.Customers AS C;
Lather, rinse repeat for all of your other dimensions.
You could also skipped the data flows and simply executed a query to do the same.
INSERT INTO dbo.Dim_Customer(CustomerName)
SELECT DISTINCT CONCAT(C.FirstName, ' ', C.LastName) AS CustomerName
FROM stage_d1.Customer AS C
WHERE NOT EXISTS (SELECT * FROM dbo.Dim_Customer AS DC WHERE DC.CustomerName = CONCAT(C.FirstName, ' ', C.LastName));
Lather, rinse, repeat for the remaining dimensions.
Loading the fact is similar except we will use the Lookup components to find matches (as we need to translate our data to their dimension ids). Here I'll show how we'd populate a simplified version of your fact table
SELECT O.Price AS UnitPrice, BO.OrderDate AS [Date], 1 AS Quantity, 0 AS Discount, CONCAT(C.FirstName, ' ', C.LastName) AS CustomerName
FROM stage_d1.Ordering as O
INNER JOIN stage_d1.Book_Order AS BO
ON BO.OrderID = O.OrderID
INNER JOIN stage_d1.Customer AS C
ON C.CustomerID = BO.Cus_CustomerID;
That's my source query. The customer lookup will continue to match on Dim_Customer's CustomerName but this time we'll retrieve the CustomerID from the lookup component.
The destination then uses the UnitPrice, Date (depends on how you do it), Quantity and Discount directly from our source. The rest of the dimension keys, we populate through our lookup.
Upvotes: 1
Reputation: 9768
the standard approach would be to do the following:
You populate all your dimension tables first and then all your fact tables.
Upvotes: 0