Stack Bounce
Stack Bounce

Reputation: 1

Oracle replication with limited range

Due to performance reasons I want to have a portion of my data to be available on a second system (Reporting).

I expect the data to be distributed like: (abstract)

There are several approaches to solve this issue for now, and I am still open for new ones.

Preferred one would be to use Oracle's snapshots - MV/MVLogs - but since I need to get rid of the old data (older than "last week")... I can't do deletes on the materialized view, right?

I cannot rely on a datetime value inside the dataset.

I thought I can prevent a replication of deleted data (at the MV/target) by using the include-clause*, but since I cannot delete - no clue.

Other one would be to duplicate the archiving procedures and delete data from older than "last week" at the new DB. Or move it in the report DB but then after a week into the archive.

Both would impact eighter the it processes or load for the existing DB more than the snapshot solution.

Any recommendations so far?

*) include-clause a b

INCLUDING
Specify INCLUDING to save both new and old values in the log.
If this log is for a table on which you have a single-table
materialized aggregate view, and if you want the materialized
view to be eligible for fast refresh, then you must specify INCLUDING.

Upvotes: 0

Views: 32

Answers (1)

Paul W
Paul W

Reputation: 11603

I'm afraid I'm having trouble understanding the details of your ask with much clarity. However, for replication and archiving in general, here are some things to consider:

Archiving

For archiving (dropping old data), there is no substitute for using Oracle table partitioning. This is normally range partitioning on a date or timestamp column. It is easy and fast to drop old partitions. This is far superior than deleting data in bulk, which is very costly. You do have to write the code that identifies and drops those old partitions, and schedule such a script or job, as Oracle won't do that for you, but it's not hard to do. If you need to archive from the source, write a job that queries the target and ensures that the data is copied in full before dropping the source partition. Don't try to bundle the partition drop with the replication work - otherwise if the data replicates and the partition drop fails, you'll never retry the partition drop and you'll have orphaned partitions that will never get cleaned up. Always code expecting failure and think about automated cleanup so a human doesn't have to do something manually to fix any mess left behind.

Replicating

For replicating data, you will most likely use a database link and pull data from the source into the target locally. While you can use MVs, I find that writing your own replicating logic gives you more flexibility and performance. For example, with full refresh MVs you cannot surgically update only new/changed rows - you have to replace the entire dataset every time. If each day 99.9% of the data is unchanged, that's a massive amount of unnecessary work. It is far more scalable to write procedures that use MERGE commands to surgically sync your target table, touching only rows that need to be touched. You can handle both new and changed rows with a single MERGE, and deletes with a second MERGE (it is possible to do avoid this second merge but not worth it). To enable PQ and PDML (parallelism) for these operations, you may need to stage the new/raw data locally in work tables, then merge to the target table so that you aren't involving a dblink at the time of the merge, though later versions of Oracle seem to have made some improvements in this area.

Incrementalism

For really massive sources, you will want to replicate incrementally so you don't have to pull the entire table across the link each time just to find changes. If you have "last updated" metadata columns in your source, you can use that to pull only new/changed rows. Don't compare such a column to some offset from SYSDATE - if your job doesn't run for a while, you could end up skipping data. Instead, store the maximum value replicated each time and use that stored value as the starting point for the next replication.

If you don't have a suitable metadata column to drive incremental pull, or you need to support deletes, to achieve incremental pull you'd have to either employ triggers and log tables to record all row changes on the source DB, or use Oracles "fast" refresh MV option, which internally does the same thing (employs a trigger and log table). There is obviously a cost to doing this, and it is often not an option due to how invasive it is on the source. Usually an aggressive replication system involving lots of tables will have to employ a mixture of the above techniques depending on table size, availability of reliable metadata columns, and necessity of handling deletes.

Upvotes: 1

Related Questions