Mohammad Asif
Mohammad Asif

Reputation: 113

Extending a fact to support 1000+ columns

Background:-

Currently I have a very wide fact table with almost 800 columns and nearly 100 millions rows in Oracle database which is storing events and each event has multiple attributes. Each attribute corresponds to a column in the database. The onboarding of event and attribute is from the product side and the steps are automated and hence at dev side we don't have any control over it.

At the product team side, we have created Materialized views for each event (precomputed) for faster retrieval. Hence from one huge fact table, we have created multiple smaller MVs for each event.

Problem:-

Oracle database has a capacity of 1000 columns only. As the product team is onboarding new events and attributes, the 1000 limit of columns will be breached. I wanted to understand how should I scale this.

Few approaches:-

Approach 1: Add another fact table which gives me support for next 1000 attributes. but problem is i would need to keep on adding multiple fact table as and when the column limit is breached. Also i would need to keep track which attribute or event is present in which fact table. Also i suspect there will be few scenarios where attributes from multiple fact table need be joined for some computation.

Approach 2: Use entity-relation model to group similar attributes together into a separate table and use a foreign key relation. This way have a single fact table with foreign key relation to multiple attribute groups. Aggregation based on attribute will be complicated as the attribute is now one level deep and now I will have multiple narrow tables with millions of rows instead of just single wide table.

Approach 3: Redesign the table to store Events as fact table. But  common attributes will be part of every event and there would redundancy storage. Also another problem i see is co-relating multiple events together will be complex.

Please help/suggest a better approach for the problem.

Upvotes: 0

Views: 77

Answers (2)

Jon Heller
Jon Heller

Reputation: 36882

Approach 4: Upgrade to 23ai and use wide tables that support up to 4096 columns. Version 23ai is the next long term release, so your organization will likely upgrade to that release some day anyway. Hopefully you can time the upgrade to happen before you run out of columns.

Upvotes: 0

MT0
MT0

Reputation: 168212

If you have a table that looks like:

CREATE TABLE events (
  -- Common to all events
  id         NUMBER PRIMARY KEY,
  start_date DATE,
  end_date   DATE,
  -- Task Related columns
  task_owner              VARCHAR2(50),
  task_expected_duration  NUMBER(6,2),
  -- Meeting Related columns
  meeting_organiser  VARCHAR2(50),
  meeting_required   VARCHAR2(4000),
  meeting_optional   VARCHAR2(4000),
  meeting_location   VARCHAR2(200)
)

Where when you insert a task then you insert data into the common and task columns but the meeting columns will be NULL. Similarly, for an event there would be values in the common and meeting columns but the task columns would be NULL.

Then you should not have one table, it should be (at least) 3 tables that are related by relational constraints:

CREATE TABLE events (
  id         NUMBER PRIMARY KEY,
  start_date DATE,
  end_date   DATE
);

CREATE TABLE task_events (
  id                 PRIMARY KEY REFERENCES events(id),
  owner              VARCHAR2(50),
  expected_duration  NUMBER(6,2),
);

CREATE TABLE meeting_events (
  id         PRIMARY KEY REFERENCES events(id),
  organiser  VARCHAR2(50),
  required   VARCHAR2(4000),
  optional   VARCHAR2(4000),
  location   VARCHAR2(200)
);

If you then want to see the columns relating to a task you can use:

SELECT e.*, t.owner, t.expected_duration
FROM   events e INNER JOIN task_events t ON .id = t.id;

and if you want to see a meeting then:

SELECT e.*, m.organiser, m.required, m.optional, m.location
FROM   events e INNER JOIN meeting_events m ON .id = m.id;

Upvotes: 2

Related Questions