Nedunuri Rajesh
Nedunuri Rajesh

Reputation: 31

How to design fact table when there are many events and its atttributes

I have a scenario to build a warehouse for recording rental activity. Basically, I have 3 tables which rental_orders, rental_events, and rental_charges. The schema for the tables are below,

Rental_orders:-

create table rental_orders
( order_id varchar(256),
order_item_id varchar(256),
rental_credit_amount numeric(28,4),
rental_credit_unit numeric(28,4),
customer_id varchar(512),
country_code varchar(56),
shipping_cost numeric(28,4),
return_cost numeric(28,4)
)

Create table rental_events(
order_item_id varchar(256),
event_id varchar(256),
event_type varchar(512),
charge_id varchar(1024),
event_start_date timestamp,
event_end_date timestamp,
extension_type varchar(256),
credit_amount decimal(28,4),
due_date_original timestamp,
due_date_extended timestamp
)
Create table rental_charges(
charge_id varchar(1024),
customer_id varchar(512),
charge_type varchar(1024),
amount decimal(28,4),
response varchar(1024)
)

The above schema follows the below rules

  1. An order_item_id in the rental_orders table can have multiple events that are preset in rental_events table.
  2. Each event in rental_events can have multiple charges that are stored in rental_charges table.

So, to design a denormalized fact table using the above schema, I thought of using accumulating snapshot fact tables but in my scenario there are 35 distinct events that can happen for a rental_order_item_id and my business wants to record data and timestamps for every event. So in my case as an event can have its own extension_type, credit_amount, due_date_original and due_date_extended creating an accumulating snapshot would have 35*4 = 140 columns,which are too many columns to maintain.

Can anyone suggest a better a approach of designing the fact tables for the above table schema.

Upvotes: 1

Views: 714

Answers (2)

Red Boy
Red Boy

Reputation: 5739

In fact, 140 columns single table in case of Data warehouse(as Redshift) is not very huge or new, usually its considered good design.

Redshift itself support up to 1600 columns in single Table.

Though, I understand the challenge of Transforming normalized data to de-normalized, but you could come-up with good ETL scripts and process, it would be better design in my opinion. We have similar use case of E-Commerce category Tree structure and our OLTP has only 4 columns that we translate into 300+ columns in Redshift which works well. Hope it helps.

Upvotes: 2

demircioglu
demircioglu

Reputation: 3465

Based on your description of the issue you are facing, you can use this approach, I call it 'pivoted fact' which is essentially storing values in rows with a link to parent table avoiding multiple columns.

Create a table event_details

Create table event_details (
event_detail_id integer,       # primary key for this table
event_id varchar(256),         # link to parent table, use what makes sense for you
extension_type varchar(256),   # additional attributes for each event
credit_amount decimal(28,4),
due_date_original timestamp,
due_date_extended timestamp)

With this you can have one or more details for each event in rows instead of creating 140 columns. You can always create derived tables for select event types etc based on requirements.

Also you might want to consider using integer or bigint for your ids if possible for best practices and performance.

Upvotes: 1

Related Questions