Furkan
Furkan

Reputation: 468

Data comparison in java

There are around 1,700,000 rows of data in a table1 and 900,000 rows of data in table2 and these tables are stored in different databases.
I am trying to compare each of them to know if data in table1 exists in table2 or not, by getting data into 2 Resultsets and comparing them roughly like this:

while(Resulset1.next())
 {
   while(Resultset2.next())
    {
      if(Resultset1.getString("whatever").equalsIgnoreCase(Resulset1.getString("whatever")))
       {
       ........
       }
    }
 }

When using ResultSet it is so slow. How can I compare them faster? Any recommendation?

Upvotes: 2

Views: 972

Answers (4)

Persixty
Persixty

Reputation: 8579

One way of comparing across databases in an environment that doesn't support heterogeneous queries is to compare in batches.

Suppose we start with a query

SELECT whatever FROM <Table-1>;

Where Table-1 is one of the tables to compare.

The first comparing query is:

SELECT <field-list> FROM <Table-2> WHERE whatever IN ( V1 , V2 , ... , V1000);

Where the 1000 values V1 thru V1000 are taken from Table-1.

The next query is:

SELECT <field-list> FROM <Table-2> WHERE whatever IN ( V1001 , V1002 , ... , V2000);

How well this performs depends on factors like whether there is a suitable index including whatever on table-2. My expectation is that all things being equal it's faster to 'look' for the smaller table in the larger table just because it will require fewer batches.

Obviously there is nothing special about chunks of 1000 and you might want to work out what the practical limit is for the length of SQL string the server will accept! If you could cram 10000 values you'd be done on 90 queries!

One further uplift might then be to use a prepared statement rather than executing SQL strings. Remember to handle odd-end chunks by then running a special query or back-filling the slack variables.

PS: One very significant benefit of this line of attack is that it doesn't place unbounded memory demands on the Application Server (wherever the Java is running). The other methods such as hash-tables are valid but the storage demands they place on the App. Server grow as the database tables grow.

Upvotes: 1

Paco Abato
Paco Abato

Reputation: 4065

You are doing 1,700,000 * 900,000 queries to the data base. Try downloading sets of rows from the tables and iterating over them. I suggest an example in pseudocode:

set1 = getNext5000RowsFromTable1
set2 = getNext5000RowsFromTable2

while (set1.isNotEmpty() && set2.isNotEmpty()) {
    foreach (row1 : set1){
        foreach (row2 : set2) {
            if (row1.getString("field").equals(row2.getString("field")){
                .....
            }
        }
    set1 = getNext5000RowsFromTable1
    set2 = getNext5000RowsFromTable2
}

Upvotes: 0

Rikard
Rikard

Reputation: 692

Retrieving common whatever using SQL:

SELECT whatever FROM TableA WHERE whatever IN (SELECT whatever IN TableB)

If you must do it in Java I suggest populating two HashSets and

setRepresentingTableA.retainAll(setRepresentingTableB)

setRepresentingTableA will contain common whatevers.

Upvotes: 1

Suresh Atta
Suresh Atta

Reputation: 121998

The immediate answer to this is do not bring all the data into Java and compare. Just bring the data matches to your criteria.

Pass both values to a query and write a query in sql.

Upvotes: 0

Related Questions