Amous
Amous

Reputation: 514

Loading date or datetime into date dimension

Let's say I have a date dimension and from my business requirements I know that the most granular I would need to go is to examine the specific day of the month that an event occurred.

The data I am given provides me with the exact time that an event occurred (YYYY-MM-DD HH:MM:SS). I have two opitons:

  1. Before loading the data into the date dimension, slice the HH:MM:SS from the date.
  2. Create the time attributes in my date dimension and insert the full date time.

The way I see it, I should go with the option 1. This would remove redundant data and save some space. However, if I go with option 2, should the business requirements ever change or if my manager suddenly wants to be more granular I wouldn't need to modify my original design. Which option is more commonly used? Are there more options that I did not consider?

Update - follow up question

I receive new data every month. If I used a pre built date dimension with all the dates would I then need to run my script every month to populate the table with new dates of that month or would I have a continuous process where by every day insert into the table one row, which would be that date?

Upvotes: 0

Views: 1663

Answers (1)

Rich
Rich

Reputation: 2279

I would agree with you and avoid option 2. A standard date dimension table is at the individual date level. If you did need to analyse by time of day, you could create an additional time of day dimension at the level of a second in a single day, and link to that from your fact table.

Your date dimension should be created by script automatically, rather than from the dates that events occurred. This allows you to analyse across a range of events from other facts, and on dates where no events occur, using a standard, prebuilt dimension.

I would also include the full date/time stamp as a column in the fact table, along with the 'DateKey' to the dimension table. This would allow you some visibility/analysis of the timestamp, you would not lose the data, and would still allow you to analyse by the date dimension.

Update - follow up question

Your pre-built date dimension (the standard way of doing it) would usually contain some dates in the future. There's no reason not to, for example, include another 5 years of dates in the table. But if you'd like it to gradually grow over time, you could have a script that is run once a day, once a month, or once a year to add new dates. Its totally up to you! There are many example scripts for building date dimensions- just google date dimension script. They exist for the language of your choice, e.g. SQL, C#, Power Query, etc.

Upvotes: 3

Related Questions