sam
sam

Reputation: 149

Staging design strategy

I am in the process of building a Data Warehouse(DW) and I have a question about loading data; I would appreciate if you guys provide your thoughts on this.

I am planning to load all the tables one-to-one in a staging database first and then load the data into the DW from the staging database. I thought about hitting the OLTP system directly(no staging) but am not 100% sure this would be the best approach from a performance perspective.

Let me give you an example: In our OLTP database, we have a view called Customers that I’ll be pulling into our DW. The view on OLTP database is pretty complex and a select statement takes 8 minutes. So If I load this table directly into the DW and do an incremental load, am thinking this would take more time than loading the view into a staging table first. Also, since the load is going to take time, the DW availability would also be affected as the data won’t be available to users for querying.

What do you guys suggest? Is the staging approach outdated now? I want to understand what the pros and cons are. Thanks in advance for your help

Upvotes: 1

Views: 182

Answers (1)

Dan Bracuk
Dan Bracuk

Reputation: 20794

I help maintain a data warehouse and while we don't use a staging database, we do use staging/working/intermediate/whatever_you_want_to_call_it tables.

The gist of what we do is this. We receive the raw data as a series of delimited files. We then do whatever we deem necessary to these files to produce load files. We then populate our working tables from the load files and do whatever we have to do to further prepare the data. Then we populate the real tables from the working tables.

We also do everything as a scheduled job, early in the morning before people come to work, to minimize the liklihood of people trying to query the warehouse while data is being loaded.

Upvotes: 1

Related Questions