Reputation: 11
Dear community,
I hope the headline gives you a hint of what I want to talk about / need advice.
I'm a BI Developer with 3 years of experience working on big BI projects - some on the health industry and some were on the finance industry when I was working at IBM. On my current job I came to a startup company, the company has an operational DB for the purpose of the product and the data is on SQL Server DB.
For 4 months I was putting fires out regarding all the mass my predecessor did and now I'm ready for the next step - Modeling the operational DB tables for DWH DB to be able to extract and use the data for analytical and BI purposes.
I don't have any resources at all - so I will build the DWH first on the operational DB and then my vision is the DWH will be on Snowflake DB after I will get resources from my CTO.
The modeling issue:
When I'm tackling the issue of data modeling I encountered some confusion about the right way to model data, there is the traditional way I'm familiar with IBM, but there are the Cloud DWH modeling and the hybrid approach.
My model need to be flexible and the data should be extract very fast.
what is the best way to store and extract data for analytical purposes?
Fact tables with a lot of dimensions - normalize approach
OR
putting all the data I need with regard to granularity at the same table (thinking about the future, moving to Snowflake) I will have several tables each one with is one granularity and his world.
I'm just interested to hear what some of you implemented at your company and if you have an advise or UC you can share, I searched at the web a lot and what I saw is a lot of biased info and very confusing - nobody is really saying what is working in the real world.
Thanks in advance!
Upvotes: 1
Views: 122
Reputation: 1258
Well two key points of normalisation are to reduce disk space used and optimise data retrieval; neither of which are all that relevant in Snowflake. Storage is dirt cheap. And for the best part, the database is self-optimised - worse case you might have to set up clustering keys on very large tables (see: https://docs.snowflake.net/manuals/user-guide/tables-clustering-keys.html)
I've found that big tables with lots of columns perform better than many smaller tables with joins. For example when testing on a flat table with 10 mil rows, with a clustering key set up; it was about 180% faster than obtaining the same resultset but with a more complex model / multi-table.
If you're anticipating a lot of writeback and require object level changes, then you should still consider normalisation - but you'd be better off with star schema in that case.
Upvotes: 2