I am building a SSAS cube that includes several time dimensions that I want users to be able to use in their analysis. For example suppose I have the following dates (imagine some sort of work item tracking system) Created date Started date Closed date Last edit date Users might want to do the following See the average number of days between the date an item was created and the date it was started See the maximum number of hours between an item being started and it being closed See the minimum number of hours between the started date and the closed date etc... In short, they want to be able to see date / time comparisons between most (if not all) of these dates. How can I provide this in a system with many dates? (preferably without adding a bazillion "difference between X and Y" fields)

Reputation: 86779

Measuring date and time ranges in SSAS

I am building a SSAS cube that includes several time dimensions that I want users to be able to use in their analysis. For example suppose I have the following dates (imagine some sort of work item tracking system)

Created date
Started date
Closed date
Last edit date

Users might want to do the following

See the average number of days between the date an item was created and the date it was started
See the maximum number of hours between an item being started and it being closed
See the minimum number of hours between the started date and the closed date
etc...

In short, they want to be able to see date / time comparisons between most (if not all) of these dates.

How can I provide this in a system with many dates? (preferably without adding a bazillion "difference between X and Y" fields)

Upvotes: 2

Answers (2)

Mike Honey

Reputation: 15037

I would create a new fact, based on the existing fact, with a row per each existing fact row, per each meaningful date combination. Eg 1 row for 'Created Date to Closed Date', another row for 'Created Date to Last Edit Date'. I would limit this to the meaningful combinations, e.g. no rows for 'Closed Date to Created Date'. There may be additional useful combinations comparing a date to 'Today'.

I would build this using a View, with a series of SELECT and UNION ALL clauses - one per meaningful date combination. The design of each SELECT clause would be to return all rows and columns from the existing fact data, plus a character column to describe the meaningful date combinations, plus a SQL calculation of the datediff in days and another in hours.

Then I would add this new View to the cube, "copying" (I wish) the dimension relationships from the current fact. I would create Avg, Min and Max variants of the datediff in days and also hours (6 measures).

Finally I would build a fact (degenerate/junk) dimension based on the column to describe the meaningful date combinations.

Upvotes: 0

FrankPl

Reputation: 13315

Keep it

A simple approach could be to just implement these measures, as for four date types there would only be six different differences, and applying three different aggregations, you would end up with 18 measures, which would be reasonable. If you use the "Display Folder" property of the measures (and your client tool uses these), I would think users would be able to easily find their way around with even forty or fifty measures. And there will for suer be some combinations that do not make sense from a business point of view, even if you could technically implement them.

Utility Attributes

Another way would be to look at this from a dimensional point of view, as you are using a multidimensional cube anyway: Create a utility dimension with three attributes:

"From date" would have four members "Created", "Started", "Closed", and "Last Edit",
"To date" would have one member for each date type as well,
and "Aggregation" would have three members "Minimum", "Maximum", and "Average".

This dimension can be completely unrelated to any measure group. The table on which it is based would contain all (valid) combinations of these attributes.

You would only use a single measure "time diff", which would probably be a calculated measure.

Possibly rename the attributes and members as well as the measure in a way that is easy to understand for end users.

Then you would implement in the calculation script the calculation of measure "time diff" in a way that checks the CurrentMember of all three attributes, and then does the appropriate calculation. This could look similar to:

CASE
     WHEN IsError([Utility].[From Date].CurrentMember) // multi selection
       OR [Utility].[From Date].CurrentMember IS [Utility].[From Date].[All]
       OR IsError([Utility].[To Date].CurrentMember)
       OR [Utility].[To Date].CurrentMember IS [Utility].[To Date].[All]
       OR IsError([Utility].[Aggregation].CurrentMember)
       OR [Utility].[Aggregation].CurrentMember IS [Utility].[Aggregation].[All] THEN
          NULL // these special cases are invalid
     ELSE
         CASE
             WHEN [Utility].[Aggregation].CurrentMember IS [Utility].[Aggregation].[Minimum] THEN
                 Min(Leaves(), 
                     CASE
                         WHEN [Utility].[To Date].CurrentMember IS [Utility].[To Date].[Started] THEN
                              Measures.StartDate
                         WHEN [Utility].[To Date].CurrentMember IS [Utility].[To Date].[Closed]
                              Measures.ClosedDate
                         ...
                      END
                      -
                     CASE
                         WHEN [Utility].[From Date].CurrentMember IS [Utility].[From Date].[Started] THEN
                              Measures.StartDate
                         WHEN [Utility].[From Date].CurrentMember IS [Utility].[From Date].[Closed]
                              Measures.ClosedDate
                         ...
                      END
                      )
             WHEN [Utility].[Aggregation].CurrentMember IS [Utility].[Aggregation].[Minimum] THEN
                 Max(Leaves(), 
                     ... // repeat code from above
                    )
             WHEN [Utility].[Aggregation].CurrentMember IS [Utility].[Aggregation].[Average] THEN
                 Avg(Leaves(), 
                     ... // repeat code from above
                    )
         END
END

This code might need some performance tuning, depending on the size of the cube, but you should get the idea: just check all cases of the attribute and implement the appropriate calculation.

The disadvantages of this second approach would be that users would need some education, as for the "time diff" measure to work, they will have to add three more attributes to every query, and apply selections as appropriate. Depending on the client tool, it is not possible so have three attributes with individual selections e. g. in two columns like

Attrib |  Col 1   |   Col 2
From   | Created  |  Started
To     | Started  |  Closed
Agg    | Min      |   Max

but only cross products of selected members.

And you have to take care/give understandable messages to users for cases like multi-selections or no selection at all (i. e. CurrentMember is the All member) for the utility attributes.

After all the disadvantages, here is a description how the utility dimension approach can be applied to a different subject, in this case, time calculations like Year over Year growth and Year to Date: http://sqlblog.com/blogs/marco_russo/archive/2007/09/02/datetool-dimension-an-alternative-time-intelligence-implementation.aspx

Mixed Approach

Finally, you could use a mixed approach, where you e. g. create measures for all combinations of date types, but a utility attribute for the aggregations.

Upvotes: 1

Measuring date and time ranges in SSAS

Answers (2)

Related Questions