Chad
Chad

Reputation: 24699

Best way/technology to implement a generic archive process

We are looking to retire old data in out Oracle warehouse.

To give a very simplified overview, a proposal was suggested to develop a process using PL SQL Stored Procedures that, given source/dest table, etc., parameters, use the Oracle ALL_TAB_COLUMNS view to construct a target table that mirrors the source table.

If the dest table exists from a previous run, the proposed solution includes comparing the current schema of the source table to the target (archive) table and, if differences are found, bringing the tables into sync. I am sure that limitations in the proposed functionality exist, but while the spec appeared to be pretty ambitious in this area, I doubt if they were going to re-write Red Gate's SQL Compare utility in PL-SQL.

I guess I have two questions.

1) Is PL/SQL really the right language to use for such a task. To me, stored procedures are used for quick-in and quick-out data operations and complex logic belongs what I would consider a more fully functional client language such as C# or some other .NET language. I anticipate a 10,000-line, poorly indented single stored procedure and I cringe at having to review it. I know Oracle SP/Pkgs need not be that way, but, for some reason, our developers tend to be less modular when using PL\SQL than when writing in .NET . I would welcome your recommendations and reasons for your choice.

2) Are there Oracle utilities (I think we are on 10g) that can be utilized for archiving purposes? Does anyone have any suggest advise?

I'll up vote any non-repeated worth while comment offered.

Thx.

Upvotes: 1

Views: 540

Answers (5)

APC
APC

Reputation: 146269

You say this is a data warehouse. Are you using partitioning? If so, does the partitioning scheme identify the rows you want to archive? If the answer to both questions is "yes" then partition exchange could be the feature you're searching for.

Upvotes: 1

Venr
Venr

Reputation: 937

Maybe I'm not reading the requirements correctly but wouldn't a simple

create <dest_table> as select * from <source_table>;

suffice? with a drop first on the dest_table if it already exists?

Upvotes: 0

Will Hartung
Will Hartung

Reputation: 118744

However you "do it", it will need to be done by hand.

Retiring data in a RDBMS is fraught with peril. Because you typically can't just archive a single table. You need to archive all of it's dependent tables as well.

Then there's the schema change issue. Not so much keeping your archive in sync with you evolving schema, but keeping your tools in sync with obsolete schemas. It's not like you can point your current applications at the "old data" and expect it to necessarily work. Hard enough to keep your apps up to date with current data, much less having it behave reasonably with old data.

If you're doing select subsets of your data, it's just simply safer, and actually easier, to craft the select and insert statements by hand, ensuring integrity, checking values, etc. than to rely on some contrived tool. It may seem arduous up front, but it's really just tedious.

But once done, you'll have much more control over what and how data is being exported and merged.

Writing it in PL/SQL is smart simply because this is a database operation. Why drag all of the data out of the server just to stuff it back in to it. The PL/SQL stuff will likely have better overall performance when this is all said and done.

As for ensuring modularity, indention, etc., well, that's why baseball bats were invented.

Upvotes: 2

DCookie
DCookie

Reputation: 43533

PL/SQL is NOT just for "quick-in and quick-out" data operations. There a very substantial apps built on it. There is nothing inherently wrong with PL/SQL for this kind of task. That said, if you anticipate a poorly written 10K line procedure in PL/SQL, don't use it. Let your programmers do what they do best.

Upvotes: 3

Amirshk
Amirshk

Reputation: 8258

First, this sound like a task for PL-SQL. The non modular code issue can be enforced, and using PL-SQL will give you better results and easier to be written.

As for the concept itself, any solution you`ll do will have a problem if the schema will be updated - the sync will fails, or worse, it won't and the data will be corrupted.

What about having a replication server where you add a "delete old records" from the main server, and only perform the insert/update on the offline server? This will allow you to both have all the data and keep the live one smaller.

Upvotes: 2

Related Questions