jl6
jl6

Reputation: 6422

Should a data warehouse ever be a primary data repository?

Typically a data warehouse is a means to consolidate multiple source systems, usually for reporting purposes. But are there any situations where it is a good idea to use the data warehouse as a primary database in its own right (i.e. capturing and managing data, not loading it from another system)?

Upvotes: 1

Views: 1455

Answers (3)

Marcus D
Marcus D

Reputation: 1144

I would have to say generally not, but ... we are being asked to build a DWH that has as one of it's source systems an "OLTP" system that collects Twitter data. In this case I think the Twitter collecting "OLTP" system only keeps 1 week worth of data and each day passes it across to the DWH for storage, then each day we feed summary information back to the Twitter OLTP system. In this case we will become the primary data repository for the Twitter data ... to be honest though this is the first time this is the case for us.

I would disagree with @Sathish Senathi I think on the statement that "Datawarehouse will retain data say upto 5 /10 years (not all details, probably summarized)". I would never summarise data going into a DWH, always keep the lowest grain transaction data, but obviously not all attributes that are in the OLTP system.

I admit to not having worked on online gambling which has very large data, but Ive worked in telecoms and insurance which have multi Terabyte DWHes. We always store down to the details of the phone call or premium/claim transactions themselves.

Upvotes: 0

Sathish Senathi
Sathish Senathi

Reputation: 215

Good recap by @tobixen. One more thing about Datawarehouse best practice is you don't modify the data, you generally append the data ( so history is kept).

Most datawarehouse technologies are geared toward this pattern. So using a typical OLTP system or Master Data System over datawarehouse wont work properly.

The general flow is

OLTP Database -> Datawarehouse > Archiving.

OLTP Database may retain data for say 1 month to 13 months. Datawarehouse will retain data say upto 5 /10 years (not all details, probably summarized) and then goes to archiving.

Being said that BigData technologies like Hadoop / Hive is making it easier to store large amount of data ( say 10 years) and process them in cost effective fashion.

Upvotes: 0

tobixen
tobixen

Reputation: 4073

The short answer - no, I don't think so. You should first be designing a good, normalized database structure for the production database - and later worry about the data warehouse part.

Keeping the warehouse and the production database separate is considered "best business practice", perhaps most importantly it's a design consideration. The data warehouse and the production database serves two different purposes. The perhaps most important function of the production database is to capture transactions reliably, consistently and unambiguously. This matters both when designing the database and when choosing the software, i.e. database engine.

The design part of it shouldn't be underestimated - I'd say that in most projects, a good database design is one the very first thing one should be doing. Getting the table structure right is more important than choosing technology.

If you're planning to grow, it may also make good sense to try and isolate the data a bit into different databases - both because it makes it easier to split up the database later and run different parts on different hardware (when you hit the "bleeding edge", it's a lot cheaper to buy more servers than to buy more powerful servers) and because you may later want to split out software modules or reuse them independently of each other. You wouldn't want to split up the data warehouse the same way.

So to recap, start with the production database, and build a data warehouse later - that can typically be postponed until you feel the urge to combine data from different sources or to add redundancy into the production database to achieve faster reports.


I was working for an online gambling company for 8 years. I participated in the design of the production database, and I was later responsible for building the data warehouse - so the comments above are based on some of the mistakes done and experiences learnt.

Upvotes: 5

Related Questions