satheesh
satheesh

Reputation: 1451

Transfer Huge data using Java

I have a requirement of transferring huge amount of data (nearly 10 million records) from one database(oracle) to another database(postgres) using Java Program. I have done the same by creating connections with the two DB and queried the data from the source db and then inserted the data into destination db by iterating the result set. But It's taking huge time to transfer the data. Is there any way to do the transfer process quickly?

Upvotes: 0

Views: 5098

Answers (3)

memo
memo

Reputation: 441

The problem here is programming languages use cursors to handle sets of tuples, cursors can only iterate through them you cant do bulk loading or anything like that, this is true for I think every programming languages, a faster solution would be connecting Oracle to PostgreSQL somehow, I'm not sure how to do that but I think it should be possible. there are Oracle functions for everything.

Upvotes: 0

Mike
Mike

Reputation: 3311

If you are limited to single threaded read data, write data, there's not a whole lot of room for improvement.

This type of performance is limited by a couple of different things, the amount of data your moving across the wire, the speed of your network, database indexing and configuration as well as some other things in the network / host.

At a minimum, you should be setting your read connection up with a larger fetchsize.

ResultSet rs;
...
rs.setFetchSize(500);

On the insert side, you should also look at batching using a CallableStatement

CallableStatement cs;
Connection conn;
conn.setAutoCommit(false);
... 
cs.addBatch();

if (rowCount % batchsize == 0) {
   int[] updateCounts = cs.executeBatch();

   conn.commit();
   batchCount = 0;

   for (int i = 0; i < updateCounts.length; i++) {
        if (updateCounts[i] < 1)
           bad.write(batchRec[i]);
   }
}

There are other things you can do in Oracle for insert performance, one of which is setting up a bulk load using a named pipe, then your process can write to that named pipe. They are non-logged operations, so it's pretty fast. I haven't done the named pipe thing from Java, so it's something to look into, but that should get you going.

You need to figure out where your bottleneck is. I have seen performance dwindle over time, because of the query is table scanning on some table, and it takes longer to retrieve data for the later rows than the earlier rows.

Like anything else, you need to start to introduce timing, to see if your select starts to take more time or if the read performance is pretty stable (a good indication of table scanning if the later row fetches take longer than earlier row fetches).

Lastly, if you can break the query down neatly, you can employ multiple worker threads to process the data in parallel.

ie. instead of

select a,b,c from source table

You would break it down like

select a,b,c from source table where a < 10;
select a,b,c from source_table where a >= 10 and a < 50;
select a,b,c from source_table where a >= 50;

Like anything else, there's a hundred ways to do things.

Upvotes: 1

nairbv
nairbv

Reputation: 4323

One alternative would just be to export all of the data in the table into a text file and then import that data directly into postgres. I don't remember what export functionality oracle has, but worst case you can always just use a query like this, dumping the contents of the table as a set of insert statements:

select 'insert into mynewtable values(' || old.a || ', ' || old.b || ...etc..|| ');' from myoldtable old;

I've definitely processed 10MM records from an Oracle database (using Java) within a period of a few hours (with a lot of processing between each record). What kind of performance are you hoping for and what are you getting now?

Do you have a lot of indexes or constraints being checked as you insert into the postgres table? Maybe something else is wrong with your code? If you know that all of the rows are valid, maybe you should drop the constraints in the postgres db while doing your inserts?

Or, if you haven't in a while, maybe you need to vacuum the database?

Upvotes: 1

Related Questions