Reputation: 31
I have a scenario to build a warehouse for recording rental activity. Basically, I have 3 tables which rental_orders, rental_events, and rental_charges. The schema for the tables are below,
Rental_orders:-
create table rental_orders
( order_id varchar(256),
order_item_id varchar(256),
rental_credit_amount numeric(28,4),
rental_credit_unit numeric(28,4),
customer_id varchar(512),
country_code varchar(56),
shipping_cost numeric(28,4),
return_cost numeric(28,4)
)
Create table rental_events(
order_item_id varchar(256),
event_id varchar(256),
event_type varchar(512),
charge_id varchar(1024),
event_start_date timestamp,
event_end_date timestamp,
extension_type varchar(256),
credit_amount decimal(28,4),
due_date_original timestamp,
due_date_extended timestamp
)
Create table rental_charges(
charge_id varchar(1024),
customer_id varchar(512),
charge_type varchar(1024),
amount decimal(28,4),
response varchar(1024)
)
The above schema follows the below rules
order_item_id
in the rental_orders
table can have multiple events that are preset in rental_events
table.rental_charges
table.So, to design a denormalized fact table using the above schema, I thought of using accumulating snapshot fact tables but in my scenario there are 35 distinct events that can happen for a rental_order_item_id
and my business wants to record data and timestamps for every event. So in my case as an event can have its own extension_type
, credit_amount
, due_date_original
and due_date_extended
creating an accumulating snapshot would have 35*4 = 140 columns,which are too many columns to maintain.
Can anyone suggest a better a approach of designing the fact tables for the above table schema.
Upvotes: 1
Views: 714
Reputation: 5739
In fact, 140 columns single table in case of Data warehouse(as Redshift) is not very huge or new, usually its considered good design.
Redshift itself support up to 1600 columns in single Table.
Though, I understand the challenge of Transforming normalized data to de-normalized, but you could come-up with good ETL scripts and process, it would be better design in my opinion. We have similar use case of E-Commerce category Tree structure
and our OLTP has only 4 columns that we translate into 300+ columns in Redshift
which works well.
Hope it helps.
Upvotes: 2
Reputation: 3465
Based on your description of the issue you are facing, you can use this approach, I call it 'pivoted fact' which is essentially storing values in rows with a link to parent table avoiding multiple columns.
Create a table event_details
Create table event_details (
event_detail_id integer, # primary key for this table
event_id varchar(256), # link to parent table, use what makes sense for you
extension_type varchar(256), # additional attributes for each event
credit_amount decimal(28,4),
due_date_original timestamp,
due_date_extended timestamp)
With this you can have one or more details for each event in rows instead of creating 140 columns. You can always create derived tables for select event types etc based on requirements.
Also you might want to consider using integer or bigint for your ids if possible for best practices and performance.
Upvotes: 1