Reputation: 1365
I am new to dimensional modeling and have read a lot of material (star-schema, dimension/fact tables, SCD, Ralph Kimball's - The Data Warehouse Toolkit book, etc). So I have a good conceptual understanding of dimensional modeling constructs but finding it hard to apply to a usecase due to lack of experience and need some guidance.
Consider Twitter for example, I want to design a dimensional model to calculate -
These metrics over a period (like month) is the summation of these metrics on each day in that period.
I want to write SQLs to calculate these metrics for every quarter by region (eg: US and rest of world) and calculate year-over-year growth (or decline) in these metrics.
Eg:
Here are some details that I thought about -
Factless (transaction) fact table for user login activity with grain of 1 row per user per login : user_login_fact_schema (user_dim_key, date_dim_key, user_location_dim_key, access_method_dim_key)
Factless (transaction) fact table for user activity with grain of 1 row per user per activity : user_activity_fact_schema (user_dim_key, date_dim_key, user_location_dim_key, access_method_dim_key, post_key, activity_type_key)
Does this sounds correct? How should my model look like? What other dimensions/facts can I add here?
Wonder if I should collapse these 2 tables into 1 and have activity_type for logins as 'login', but there can be a huge number of logins without any activity so this will skew the data. Am I missing anything else?
Upvotes: 4
Views: 2510
Reputation: 2568
Your model seems correct, it answers the questions on the graph you posted.
It could make sense to aggregate those two fact tables into one fact table joined with a "UserAction" dimension, mostly because a login can be interpreted as just another user action.
However, having separate fact tables focused on one metric (or process) may be preferable because it enables you to introduce measures/metrics into the tables, i.e. when your fact tables stop being factless. It also spares you a join with another dimension (UserAction) but that is becoming a bit less relevant these days, where storage and DB processing power are just getting cheaper.
Upvotes: 4
Reputation: 555
You should keep the data on different tables to make sure you dont mix different grains.
user_login_fact_schema can be a materalized view based on user_activity_fact_schema filtering for activity type=login and including some logic to exclude duplicates (i.e. one login per user per day, if you are talking about daily active users)
Upvotes: 2