Star Schema with multiple report-group-specific aggregations: good or bad

Question

Objective

We're building Enterprise DWH for business self-service: tens of TBs of data and about 30 business users. The flow looks like: Sources -> ETL -> DWH -> Power BI -> User.

Transaction grain facts may contain billions of rows, non-additive measures and KPIs. Because of that, external in-memory cubes (tabular model) or PBI import mode isn't an option for us. In the same time we have very strict performance requirements - PBI visualizations shouldn't take more than 15 seconds to be built.

For sake of performance and usability, we end up with PBI team defining materialized views to build multiple (not too many at this point) aggregated derivatives from each transactional fact table (at DWH layer). Each derivative is just a more aggregated fact table plus pre-calculated/aggregated KPIs.

The issue

Partially because of Governance hasn't been implemented yet, and maybe because of number of tables & KPIs, business users find transaction-grain Star Schema too complex (and sometimes slow), and tend to use only derivative aggregated facts for data exploration. I have the feeling like transaction-grain would be used only by Power BI team, and can't say how many derivatives we'll have for each transactional fact table in future (it depends, maybe from 5 to 10).

Question

Is the approach we're doign now = a standard (best practice) approach? Shall we encourage our business users to use transactional facts? Or creating 5 derivative aggregations and putting burden on side of Power BI team is a good approach?

P.S.

How common is the 15 seconds requirement for PBI reports? Means when the user select any slicer value, the report should be re-freshed in < 15 seconds. Isn't it too low threshold?

David Browne - Microsoft · Accepted Answer

Is the approach we're doign now = a standard (best practice) approach?

Yes. Building partial aggregates either with (materialized) views, or in Power BI in-memory tabular models is completely normal. These are just "data marts", and they're built for a particular purpose and a particular audience. There's an inherent tension between a full-fidelity model that captures all the relevant facts and dimensional attributes for an enterprise, and a model that's simple to navigate and and answer questions for a particular purpose or perspective.

And there's no way to really define measures in the DWH, as non-additive measures can't be calculated at the lowest grain or at any intermediate grain. So you really need the tabular models to define standardized, reusable calculations.

How common is the 15 seconds requirement for PBI reports?

Quite. It's an interactive reporting tool, and it often takes several separate queries to refresh a report page. So query response time of 10sec or greater leads to a very poor user experience.

Shall we encourage our business users to use transactional facts?

Some will thrive going directly to the lowest grain and having access to all the data, so you shouldn't discourage it. But most will not, and will want a more curated view of the data to start from.

Or creating 5 derivative aggregations and putting burden on side of Power BI team is a good approach?

Think about it this way. Whether it's your end user/analysts building the models, or your Power BI team, the result is the same. Starting with your DWH layer a model is built to select relevant data, define meaningful measures and provide acceptable performance. That model might be just for a single report, or it might be shared for a whole department.

Star Schema with multiple report-group-specific aggregations: good or bad

Objective

The issue

Question

P.S.

Answers (1)

Related Questions