Reputation: 2090
I'm trying to model my data warehouse using a star schema but I have a problem to avoid joins between fact tables.
To give a trivial idea of my problem, I want to collect all the events who occur on my operating system. So, I can create a fact table event
with some dimensions like datetime
or user
. The problem is I want to collect different kinds of event: hardware event and software event.
The problem is those events have not the same dimensions. By instance, for a hardware event, I can have physical_component
or related_driver
dimensions and, for a software event, software_name
or online_application
dimensions (that is just some examples, the idea to keep in mind is the fact event
can be specialized into some specific events with specific dimensions).
In a relational model, I would have 3 tables organized like that:
The problem is : how to handle joins between fact tables in a star schema?
JOIN
SQL statement in all of our queries.
NULL
value.
event
and in its own table. Because we will store a huge quantity of data, I'm not sure this duplication is a good idea.
Upvotes: 6
Views: 7161
Reputation: 4439
Events should be a single fact. If you split them in two, you'll have a difficult time doing aggregations across both.
If necessary, you can have separate hardware and software attribute dimensions, but you should have a generic event dimension, even if it is just a junk dimension with a few simple attributes, e.g. type (hardware/software), criticality (high, low), etc.
On a side note, I've generally seen the diagrams with the arrows coming from the fact going to the dimensions. The fact table keys look at the dimensions rather then the other way around.
Upvotes: 0
Reputation: 2279
From your description and your subsequent comments to other answers, I'd say that option 2 or option 4 are the right way to model things from a dimensional modelling perspective. Each fact should be a measure of a business process, and the dimensionality of software and hardware events seems to be sufficiently different that they warrant being stored separately.
Then, there's a case for also storing the separate events table as a view, materialised view, or plain-ol' table storing the things that are common.
Once you've decided that's the right way to model things 'logically', you then need to balance performance, maintainability, usability and storage. For dimensional modelling, usability and performance of queries take top priority (otherwise you may as well not use a dimensional model at all), and the extra work in ETL, and extra space needed, are prices worth paying.
A non-materialised view would save you the space at the price of performance, but it could be that you could give it a sufficiently awesome index or two that would mitigate that. A materialised view will give you performance at the price of storage.
I'd be tempted to create the two fact tables with indexes and a non-materialised view, and see what performance of that is like before taking further performance enhancing steps. 10 million fact rows isn't so bad, it might still perform.
A materialized view can be queried directly. But if you want to, you can use the query rewrite capabilities of Oracle so that the Materialized view is instead used as a performance-enhancer, like an index, when you're querying the original tables. See here for details: http://www.sqlsnippets.com/en/topic-12918.html Whether you choose to use it in query rewrite mode or just as a view in its own right depends on whether you want the users to know about this extra table, or for it to just sit in the background as a helpful friend.
Upvotes: 4
Reputation: 89671
You would never/rarely join fact tables together. You may join aggregated facts which share (conformed) dimensions (i.e. Number of software events per hour compared with number of hardware events per hour).
To me, you always have to consider the kinds of questions that are going to be asked when looking at dimensional modeling.
Upvotes: 0
Reputation: 2195
There doesn't seem to be a reason in your scenario to combine or link the two types of events. Having said that, you may have some reason you did not describe (for example, collecting logs from many systems and wanting to see them together easily).
So my advice is to make a single fact table with both hardware and software dimension keys. One of them is always going to be 0 or -1 (= default 'n/a' record).
This allows you to aggregate them together without UNION statements or other complicated logic and can even support events that are linked to both hardware and software if they appear in the future.
Upvotes: 0