cloudtechnician
cloudtechnician

Reputation: 725

Challenge in Migrating Huge Table with LOB Data

We're trying to migrate around 230 GB Oracle EC2 hosted Database into RDS. The challenge with the DB is that there is one 150GB table which has lot of LOB data. When we try to migrate with data using Oracle Import/Export (Data Pump) it takes around 9.5hrs to export the 150 GB table and which has LOB data and 2hrs to import the dump into RDS whereas other tables migrated fairly quickly. We're using instance with highest configurations but still don't see any improvement in performance.

Just to see the time difference exported the 150GB dump again on EC2 and in second run it took only 3hrs time. Could someone please suggest me a better approach to reduce the export/import time.

PS: We also tried to use RedGate tool to identify schema and data differences between Oracle databases but this tool also failed to perform comparison on Huge LOB tables.

Upvotes: 2

Views: 2542

Answers (2)

TenG
TenG

Reputation: 4004

Regarding your PS note on using RedGate to compare data, I assume this is for data reconciliation.

Again, I am only able to give general guidance, but on my last migration I had terabytes of data to compare, and using the traditional reporting to disk and diffing was too slow and we didn't have enough disk.

So what I did was write Java (because it is usually available on the Oracle server, and you really want to be running on on e of the servers to minimise network bottlenecks) program that did the following:

  • Open JDBC connection to both databases
  • Specify the table to process, and the range of IDs, and an batch size
  • Read rows in the given range rage ID from the table, and fill a memory area with batch size - I used java.util.concurrent.ArrayBlockingQueue and java.util.concurrent.BlockingQueue objects
  • Use threads to read the data from both databases
  • When both queues fill up run a compare on the 2 queues

Regarding comparing blobs, the approach I took was use DBMS_CRYPTO to hash the LOB values and compare them. This reduces the amount of data being read into Java. Any differences were highlighted for further investigation.

Obviously , being a Java external process it could be run in parallel up to the optimal number. This method proved faster than the other tools we had available.

SInce this project I have discovered Oracle provides a new package called DBMS_COMPARISON. It might be worth your while looking at this. I am looking for a chance to compare this to my custom solution.

Upvotes: 0

TenG
TenG

Reputation: 4004

Quickest way I found to migrate large volumes of LOBs was as follows:

Extract

  • Write a customer extract (Java) that will process a given range of ID values.
  • Write the non-LOB data for each row as a CSV, and the LOB as a file and reference the file in the CSV for each row.
  • Run as many extracts in parallel (these are external Java processes so should not be blocked by Oracle restrictions on your licence)
  • Preferably wrote the extract to a file system that can be "swung" between the 2 servers. If possible have as many of these fileystems as you have parallel extract processes running.

Load

  • Use SQLLoader. It has an option

    image_fname FILLER CHAR(80), image LOBFILE(image_fname) TERMINATED BY EOF

  • Experiment with DIRECT=Y - on some versions (10g) I found it a little unreliable, and got regulars crashes in my project, but it may have improved with this type of load.

  • Again, use parallel sql loader processes where possible - you might want to disable constraints and indexes, or use reverse indexes to reduce block contention for parallel loads.

You might consider partitioning on the target and parallel loading each partition.

The advantage of using "swinging" file systems is that you eliminate network bottlenecks.

These notes are a general guide, not specific, and will require some tuning and experimentation to get the optimal mix.

Upvotes: 1

Related Questions