mclark1129
mclark1129

Reputation: 7592

Performing Calculations on a Subset of OLAP data in MDX

I'm trying to figure out a way I can filter out data in my cube so that I can perform time-series calculations such as a moving average using only that subset.

For example, let's say that I have a fact table with the following columns:

I also have a Time dimension with a composite key of DayId and HourId. This dimension has a key for every hour over the span of 100 days, so the keys go from (1,1) to (100,24).

In the fact table there is a value for every point in time so it looks like

DayId HourId Value
1     1       50
1     2       60
1     3       75.2
...   ...     ...
100   23      87
100   24      89

Now, suppose I want to calculate a daily moving average from the beginning of time through some arbitrary point in the middle of the day. Basically, I would want to calculate the average using the last point of every day except the last one, which would use a different point in time in the middle of the day. If I was to do a moving average from day 1 to day 10, ending at noon of the 10th day (HourId 12), the data I would use for my calculation would look like:

DayId HourId Value
1     24     80
2     24     90
3     24     39
4     24     60
...   ...    ...
9     24     10 
10    12     30

In SQL, I could retrieve a set like this pretty easily:

SELECT 
    *
FROM
    [FactTable]
WHERE
    ((DayId BETWEEN 1 AND 9) AND (HourId = 24))
    OR ((DayId = 10) AND (HourId = 12))

I'm pretty new to OLAP and MDX, so I've really been struggling with the right way to do this. So far, the best I've been able to do is to perform a sub-select in my FROM clause, and essentially construct a tuple set of only the rows I want:

WITH
    MEMBER [SMA 10 Value] AS
    AVG (
        ([Time].[DayId].Lag(9):[Time].[DayId], [Time].[HourId])
        , [Value]
    )
SELECT
    {
      [Value]
      , [SMA 10 Value]
    } ON COLUMNS
    , ([Time].[DayId], [Time].[HourId]) ON ROWS
FROM
(
    SELECT
        [Measures] ON COLUMNS
        , {
            ([Time].[DayId].[1]:[Time].[DayId].[9], [Time].[HourId].[24])
            , ([Time].[DayId].[10], [Time].[HourId].[12])
    } ON ROWS
    FROM
        [Cube]
)

However, it doesn't seem to quite work right for my calculations. The moving average seems to be correct over the first 9 days, because their tuples all have the same hour ID, but when I get to the final day, instead of using the values from the previous 9 tuples, it performs the average over the previous 9 days with the 12 Hour ID.

What am I doing wrong here, is there a better way that I can filter my time dimension down to eliminate unwanted rows from my calculations?

Upvotes: 1

Views: 871

Answers (1)

Bill
Bill

Reputation: 4585

I'm a bit new to MDX, so take it for what it is worth, but here is my solution for you.

With
SET [AvgOver] as
  UNION(
    ({[Time].[DayID].CurrentMember.Lag(9):[Time].[DayID].CurrentMember.Lag(1)},
        [Time].[HourID].24)
    ([Time].[DayID].CurrentMember, [Time].[HourID].CurrentMember)
  )
MEMBER [SMA 10 Value] as
  Avg(AvgOver, [Value])
Select
  ([Value], [SMA 10 Value]) on Columns,
  ([Time].[DayID].[10], [Time].[HourID].[12])
From [Cube]

I broke out the SET construction to a set construction into a separate block because that seems to be the hard work. Get that right and the Avg is easy.

You say you want an average using the selected value from the current day and the final value from the 9 previous days. That to me suggests a UNION function to put two well-defined sets together.

Upvotes: 2

Related Questions