Reputation: 113
Background:-
Currently I have a very wide fact table with almost 800 columns and nearly 100 millions rows in Oracle database which is storing events and each event has multiple attributes. Each attribute corresponds to a column in the database. The onboarding of event and attribute is from the product side and the steps are automated and hence at dev side we don't have any control over it.
At the product team side, we have created Materialized views for each event (precomputed) for faster retrieval. Hence from one huge fact table, we have created multiple smaller MVs for each event.
Problem:-
Oracle database has a capacity of 1000 columns only. As the product team is onboarding new events and attributes, the 1000 limit of columns will be breached. I wanted to understand how should I scale this.
Few approaches:-
Approach 1: Add another fact table which gives me support for next 1000 attributes. but problem is i would need to keep on adding multiple fact table as and when the column limit is breached. Also i would need to keep track which attribute or event is present in which fact table. Also i suspect there will be few scenarios where attributes from multiple fact table need be joined for some computation.
Approach 2: Use entity-relation model to group similar attributes together into a separate table and use a foreign key relation. This way have a single fact table with foreign key relation to multiple attribute groups. Aggregation based on attribute will be complicated as the attribute is now one level deep and now I will have multiple narrow tables with millions of rows instead of just single wide table.
Approach 3: Redesign the table to store Events as fact table. But common attributes will be part of every event and there would redundancy storage. Also another problem i see is co-relating multiple events together will be complex.
Please help/suggest a better approach for the problem.
Upvotes: 0
Views: 77
Reputation: 36882
Approach 4: Upgrade to 23ai and use wide tables that support up to 4096 columns. Version 23ai is the next long term release, so your organization will likely upgrade to that release some day anyway. Hopefully you can time the upgrade to happen before you run out of columns.
Upvotes: 0
Reputation: 168212
If you have a table that looks like:
CREATE TABLE events (
-- Common to all events
id NUMBER PRIMARY KEY,
start_date DATE,
end_date DATE,
-- Task Related columns
task_owner VARCHAR2(50),
task_expected_duration NUMBER(6,2),
-- Meeting Related columns
meeting_organiser VARCHAR2(50),
meeting_required VARCHAR2(4000),
meeting_optional VARCHAR2(4000),
meeting_location VARCHAR2(200)
)
Where when you insert a task then you insert data into the common and task columns but the meeting columns will be NULL
. Similarly, for an event there would be values in the common and meeting columns but the task columns would be NULL
.
Then you should not have one table, it should be (at least) 3 tables that are related by relational constraints:
CREATE TABLE events (
id NUMBER PRIMARY KEY,
start_date DATE,
end_date DATE
);
CREATE TABLE task_events (
id PRIMARY KEY REFERENCES events(id),
owner VARCHAR2(50),
expected_duration NUMBER(6,2),
);
CREATE TABLE meeting_events (
id PRIMARY KEY REFERENCES events(id),
organiser VARCHAR2(50),
required VARCHAR2(4000),
optional VARCHAR2(4000),
location VARCHAR2(200)
);
If you then want to see the columns relating to a task you can use:
SELECT e.*, t.owner, t.expected_duration
FROM events e INNER JOIN task_events t ON .id = t.id;
and if you want to see a meeting then:
SELECT e.*, m.organiser, m.required, m.optional, m.location
FROM events e INNER JOIN meeting_events m ON .id = m.id;
Upvotes: 2