Dmitri
Dmitri

Reputation: 413

Why I should break distinct count measures into separate measure groups?

When I am trying to create DistinctCount measure in VS 2008 OLAP project, it creates it in separate group instead original group that I want. Therefore, I created it in two steps. I created measure without aggregation in needed group and then set AggregationFunction to DistinctCount manually.

Then I've built project and deployed. Everything works fine except warning in OLAP project. My measure is highlighted by a blue wavy line and it says:

Break distinct count measures into separate measure groups

Why should I do it and is it safe to ignore this warning?

Upvotes: 2

Views: 895

Answers (2)

mprost
mprost

Reputation: 301

From the book Applied Microsoft Analysis Services 2005 And Microsoft Business Intelligence Platform, page 149:

When you indicate that you want a distinct count measure, the Cube Designer automatically creates a new measure group for you. While you can have a single distinct count measure defined in a measure group, there are good reasons to move it to a separate group. First, when you add a distinct count measure to a measure group, the entire measure group is treated as nonaggregatable. This reduces the effectiveness of the server-side cache by requiring more diskbased queries. Second, designing aggregations when a distinct count measure exists will result in much larger aggregations because the distinct count measure will internally be represented as a level. This level will participate in all aggregations and generally be quite large. Finally, the measure group records must be ordered by the distinct count field and ordering has to be done while SSAS queries the fact table (an ORDER BY condition is added to the SELECT statement). Since this operation could be really heavy for a large fact table, you should carefully consider the cost-benefit ratio before adding a Distinct Count measure to a cube.

Upvotes: 0

Hadi
Hadi

Reputation: 37358

Referring to the Microsoft white paper entitled "Analysis Services Distinct Count Optimization":

As a quick high-level overview of distinct count calculations, the distinct count measure is not designed like other measures (such as sum, count, max, min, and so on). Distinct counts are difficult to aggregate in comparison to other measures because the former are not fully additive. Because of this, the vast majority of distinct count queries are calculated at run time at the leaf level granularity, specifically the unique values of the distinct count measure. This means that the Analysis Services storage engine parses the source column and treats it almost like an attribute that has been GROUPED BY with an implicit count for each row.

Later they mentioned that:

As previously noted, this is because the distinct count measure is architecturally different than other measures. You cannot have more than one distinct measure because you cannot order a MEASURE GROUP by many measures. The distinct count measure is also at a lower level of granularity than SUM and COUNT measures. By adding other measures to the measure group holding a distinct count measure, all of the other measures will be at the same granularity as the distinct count measure, resulting in inefficient data structures and suboptimal queries. To perform distinct count aggregations, each distinct count measure still retains its unique combination of dimension key attributes.

You can download the white paper from the following link:

Upvotes: 0

Related Questions