KeeyPee
KeeyPee

Reputation: 310

SQL: Build a materialized view from a materialized view with billion of rows

we are currently migrating from one database to another. We have like 60 million users and 15 billion contacts. For the migration we want to seperate the both tables to only hold only active objects.

We are creating a view that only contains active users that logged in later than 2016 or are still paying. ~ 8 million rows

Now we want to create a view with valid contacts that contains only user and partner from the user view. ~ 400 million rows

The building & refreshing of the view from the users takes like 1h. But the view from the contacts are taking like 14 days. Refreshing it afterwards WITHOUT CHANGES more then 5 days. -> We are building the users view with logs. For the contacts we are using a log contacts and the view of the users.

CREATE MATERIALIZED VIEW LOG ON SCHEMA.USER with rowid, primary key including new values;
CREATE materialized view SCHEMA.ACTIVE_USERS
    BUILD DEFERRED
    REFRESH FAST
    DISABLE QUERY REWRITE
as
select u.*, u.ROWID as U_ROWID
from SCHEMA.USER u
where -- only active users;
CREATE MATERIALIZED VIEW LOG ON SCHEMA.CONTACTS
    WITH PRIMARY KEY, ROWID
    INCLUDING NEW VALUES;
CREATE MATERIALIZED VIEW SCHEMA.ACTIVE_CONTACTS
    PARALLEL
    BUILD DEFERRED
    REFRESH FAST
    DISABLE QUERY REWRITE
AS
SELECT /*+ PARALLEL */ EM.*,
    EM.ROWID AS EMROWID
FROM SCHEMA.CONTACTS EM
INNER JOIN SCHEMA.ACTIVE_USERS AU ON AU.userid = EM.userid
INNER JOIN SCHEMA.ACTIVE_USERS AU2 ON AU2.userid = EM.partnerid
WHERE -- only active contacts;

What is the best solution to resolve this?

Upvotes: 0

Views: 664

Answers (1)

Oliver
Oliver

Reputation: 3255

As with all performance questions, there are more guesses than answers. Here are a few:

  • Joining an 18 billion rows table to a 6 million rows table - twice - isn't going to be fast in any case
  • Maybe some indexes on ACTIVE_USERS might help
  • Building ACTIVE_CONTACTS from ACTIVE_CONTACTS is surely a typo. I guess you mean to query from CONTACTS
  • don't underestmate the complexity of "WHERE -- only active contacts". Maybe its better to filter that out to a mat. View first, then join.
  • To gain insight on performance, try transforming you scenario to multiple CREATE TBALE AS SELECT... statements. That way, you can evaluate the execution plans for each step and see the bottlenecks there. If you hit memory limits for the join, it might spill to temp tablespace, causing massive slowdowns.

Upvotes: 1

Related Questions