chris
chris

Reputation: 43

Best way to select out of millions of rows in an Oracle DB

G'day!

I have one million different words which I'd like to query for in a table with 15 million rows. The result of synonyms together with the word is getting processed after each query.

table looks like this:

    synonym      word
    ---------------------
    ancient      old
    anile        old
    centenarian  old
    darkened     old
    distant      far
    remote       far
    calm         gentle
    quite        gentle

This is how it is done in Java currently:

....
PreparedStatement stmt;
ResultSet wordList;
ResultSet syns;
...

stmt = conn.prepareStatement("select distinct word from table");
wordList = stmt.executeQuery();

while (wordList.next()) {
    stmt = conn.prepareStatement("select synonym from table where word=?");
    stmt.setString(1, wordList.getString(1));
    syns = stmt.executeQuery();

    process(syns, wordList.getString(1));
}
...

This is incredible slow. What's the fastest way to do stuff like this?

Cheers, Chris

Upvotes: 3

Views: 8062

Answers (7)

user55904
user55904

Reputation: 97

You should also consider utilizing the statement object's setFetchSize method to reduce the context switches between your application and the database. If you know you are going to process a million records, you should use setFetchSize(someRelativelyHighNumberLike1000). This tells java to grab up to 1000 records each time it needs more from Oracle [instead of grabbing them one at a time, which is a worst-case-scenario for this kind of batch processing operation]. This will improve the speed of your program. You should also consider refactoring and doing batch processing of your word/synonyms, as

  1. fetch 1
  2. process 1
  3. repeat

is slower than

  1. fetch 50/100/1000
  2. process 50/100/1000
  3. repeat

just hold the 50/100/1000 [or however many you retrieve at once] in some array structure until you process them.

Upvotes: 1

John Gardner
John Gardner

Reputation: 25178

related but unrelated:

while (wordList.next()) {
    stmt = conn.prepareStatement("select synonym from table where word=?");
    stmt.setString(1, wordList.getString(1));
    syns = stmt.executeQuery();

    process(syns, wordList.getString(1));
}

You should move that preparestatement call outside the loop:

stmt = conn.prepareStatement("select synonym from table where word=?");
while (wordList.next()) {
    stmt.setString(1, wordList.getString(1));
    syns = stmt.executeQuery();

    process(syns, wordList.getString(1));
}

The whole point of preparing a statement is for the db to compile/cache/etc because you're going to use the statement repeatedly. You also may need to clean up your result sets explicitly if you're going to do that many queries, to ensure that you don't run out of cursors.

Upvotes: 1

chris
chris

Reputation: 43

The problem is solved. The important point is, that the table can be sorted by word. Therefore, I can easily iterate through the whole table. Like this:

....
Statement stmt;
ResultSet rs;
String currentWord;
HashSet<String> syns = new HashSet<String>();
...

stmt = conn.createStatement();
rs = stmt.executeQuery(select word, synonym from table order by word);

rs.next();
currentWord = rs.getString(1);
syns.add(rs.getString(2));

while (rs.next()) {
    if (rs.getString(1) != currentWord) {
        process(syns, currentWord);
        syns.clear();
        currentWord = rs.getString(1);
    }
    syns.add(rs.getString(2));
}
...

Upvotes: 0

JosephStyons
JosephStyons

Reputation: 58825

PreparedStatement stmt;
ResultSet syns;
...

stmt = conn.prepareStatement("select distinct " + 
                             "  sy.synonm " + 
                             "from " +
                             "  table sy " +
                             "  table wd " +
                             "where sy.word = wd.word");
syns = stmt.executeQuery();
process(syns);

Upvotes: 1

sblundy
sblundy

Reputation: 61434

Two ideas:

a) How about making it one query:

select synonym from table where word in (select distinct word from table)

b) Or, if you process method needs to deal with them as a set of synonyms of one word, why not sort them by word and start process anew each time word is different? That query would be:

select word, synonym 
from table 
order by word

Upvotes: 4

configurator
configurator

Reputation: 41690

Why are you querying the synonyms inside the loop if you're querying all of them anyway? You should use a single select word, synonym from table order by word, and then split by words in the Java code.

Upvotes: 3

JeeBee
JeeBee

Reputation: 17556

  1. Ensure that there is an index on the 'word' column.

  2. Move the second prepareStatement outside the word loop. Each time you create a new statement, the database compiles and optimizes the query - but in this case the query is the same, so this is unnecessary.

  3. Combine the statements as sblundy above has done.

Upvotes: 5

Related Questions