Argho Chatterjee
Argho Chatterjee

Reputation: 599

Difference between sql query aggregation and aggegration and querying an OLAP cube

I have a query with respect to the advantages of building a OLAP cube vs aggregating data in database table for querying ,data of say 6 months and then archiving the sql table later for analytics purpose.

Which one is better, table or OLAP cube? and why since I can aggregate and keep data in my tables also and query the aggregated data as and when needed.

Upvotes: 1

Views: 1638

Answers (3)

whytheq
whytheq

Reputation: 35577

If there are very regular use cases which involve specific aggregated data, and this aggregated data would take a while to return from sql database tables then a cube might help.

If there are lots of potential ways in which your db table data needs to be sliced and diced at an aggregated level then there is definitely a good argument to start playing around with olap cubes.

In terms of sums of data olap is a great aggregation tool. I'm not convinced that it is the best tool for distinct counts though, so if your requirements includes lots of distinct counts then maybe look elsewhere. Do you have the option of Tabular/PowerPivot/DAX ?

Upvotes: 0

Ron Dunn
Ron Dunn

Reputation: 3078

To add to Jo's great answer, consider the grain of the facts that need to be aggregated and compared. If you have daily sales by product, but budgets by month and product category, you're going to need an aggregate fact table based on sales in order to compare budgets. That would be further represented as two cubes in your OLAP database - Sales cube, and Budget cube.

Upvotes: 1

Jo Douglass
Jo Douglass

Reputation: 2085

Short version: Like many development decisions, it depends.

Long version: I wouldn't say that one is "better" than the other - it's just that the two have separate uses and one or the other might be the better solution depending on what the requirements are.

If you have a few specific reports which require specific aggregations, then it might be simpler and easier for everyone involved to just aggregate that information in a table or a view, and point your reports at that.

As an example, if you know your users only want reports at a monthly level for a particular set of parameters - maybe your sales department want the monthly value of each salesperson's sales, for example - then your best bet might be to aggregate this up and pop it into a report where they can select the month and the salesperson, and get the number that they want.

The benefits of this might be that it's quick to develop and provide to your users, there's not too much time spent testing as only a few figures need checking, etc. Your users also don't need to spend time being trained/learning to use a cube - reports are generally pretty easy for people to pick up and use.

But if your users want to be able to carry out much more open-ended analysis on their own terms then it's not much use if you need to go away and develop a report every time they have a new requirement. Your database might start getting very full of similar-but-different tables full of aggregated amounts. You could run into issues where one report ends up not agreeing with another for some reason - you might find you're dealing with the same data quality issues over and over again in each report.

In this case, it might make more sense to develop a cube over the top of data held at the lowest grain which your users want to analyse. In this way, they can essentially self-serve, rather than getting back in touch with you every time they need a new set of aggregated data. They can slice and dice through the data using multiple different "parameters" (dimensions in the OLAP world), rather than being limited by the nature of the reports.

Aggregated data still sometimes plays a role even when you have a cube in place, though. Sometimes performance gains can be found by aggregating data up to certain levels and holding it in a physical table, and getting your OLAP tool to use the physically aggregated data at that level instead of using its own aggregations - but this is an optimization step which would need careful consideration to see whether it's beneficial in terms of performance, whether the space vs. performance payoff is worthwhile, etc. I wouldn't worry about this aspect if you're just starting to look at OLAP, but wanted to note it for the sake of completeness.

Upvotes: 2

Related Questions