Reputation: 1840
I am building a time dimension for only time in my data warehouse. I already have a date dimension.
How do I denote an unknown time? In my DimDate dimension, I marked 01/01/1753 as being reserved for unknown dates, but I think a time will be a bit harder. We don't allow NULLs in our fact tables. How do I do this, and what might that row look like?
Upvotes: 3
Views: 634
Reputation: 26
You state the "We don't allow NULLs in our fact tables " but ask "How do I denote an unknown time?"
Assuming you are using in your FACT table a data type TIME + enforce a NOT NULL constraint on data arriving from source system => you simply cannot insert unknown\invalid time into your fact and hence should have no problem.
The obvious exception to the above is an invalid business wise value reported by the source system such as Sunil proposed ('00:59:59.9999999') but this is very uncommon, unstable solution for obvius reasons (changing requirements can easily turn this value into a valid one)
If you chose to allow (and i hope you did) records with NULL values or invalid dates from your source system to enter the fact then the best practice would be using surrogate keys on our DimTime and insert them as FK into your FACT tables – this will easily allow you to represent valid + invalid values in your dimension. This approach can easily also support the approach of an invalid business wise value ('00:59:59.9999999'), such a value gets an FK_DimTime=-1.
I strongly advise on allowing specific types of garbage from source systems to enter the FACT (i.e – invalid\missing\NULL date\time values) tables as long as you clearly mark it in relevant DIMs as this tends to drive Users to improve data quality in source systems.
Here is some background on the matter https://www.kimballgroup.com/1997/07/its-time-for-time/ https://www.kimballgroup.com/2004/02/design-tip-51-latest-thinking-on-time-dimension-tables/
Upvotes: 1
Reputation: 381
Just create a DimTime
records with a -1
technical surrogate key and populate to the time column a value '00:59:59.9999999
'. This way this will be a unlikely time ever captured (accuracy to the last digit) by your DWH, it will always equate to a unknown in your reports or queries when you want to put filter like,
EventTime < @ReportTime AND EventTime <> '00:59:59.9999999'
Hope this is viable solution to your problem.
Upvotes: 0
Reputation: 16260
It can look like anything you want. Most dimensions have a 'display name' of some kind, so your dimensions could look something like this:
create table dbo.DimDate (DateID int, DateValue date, DisplayDate nvarchar(20))
go
-- this is an unknown date; 1753-01-01 is only there because we need some valid date value
insert into dbo.DimDate values (1, '1753-01-01', 'Unknown')
go
-- this is the real date 1 Jan 1753
insert into dbo.DimDate values (2, '1753-01-01', '01 Jan 1753')
go
create table dbo.DimTime (TimeID int, TimeValue time, DisplayTime nvarchar(20))
go
-- this is an unknown time; 00:00 is only there because we need some valid time value
insert into dbo.DimTime values (1, '00:00', 'Unknown')
go
-- this is the real time value for midnight
insert into dbo.DimTime values (2, '00:00', 'Midnight')
go
Of course, this assumes that your reporting tool and users use the DisplayDate
and DisplayTime
columns for filtering instead of the DateValue
and TimeValue
columns directly, but that's simply a matter of training and standards and whatever solution you adopt needs to be understood anyway.
There are other alternatives such as a flag column for unknown values, or a convention that a negative TimeID
indicates an unknown value. But those are less obvious and harder to maintain than an explicit row value, in my opinion.
Upvotes: 0