Reputation: 43
G'day!
I have one million different words which I'd like to query for in a table with 15 million rows. The result of synonyms together with the word is getting processed after each query.
table looks like this:
synonym word
---------------------
ancient old
anile old
centenarian old
darkened old
distant far
remote far
calm gentle
quite gentle
This is how it is done in Java currently:
....
PreparedStatement stmt;
ResultSet wordList;
ResultSet syns;
...
stmt = conn.prepareStatement("select distinct word from table");
wordList = stmt.executeQuery();
while (wordList.next()) {
stmt = conn.prepareStatement("select synonym from table where word=?");
stmt.setString(1, wordList.getString(1));
syns = stmt.executeQuery();
process(syns, wordList.getString(1));
}
...
This is incredible slow. What's the fastest way to do stuff like this?
Cheers, Chris
Upvotes: 3
Views: 8062
Reputation: 97
You should also consider utilizing the statement object's setFetchSize method to reduce the context switches between your application and the database. If you know you are going to process a million records, you should use setFetchSize(someRelativelyHighNumberLike1000). This tells java to grab up to 1000 records each time it needs more from Oracle [instead of grabbing them one at a time, which is a worst-case-scenario for this kind of batch processing operation]. This will improve the speed of your program. You should also consider refactoring and doing batch processing of your word/synonyms, as
is slower than
just hold the 50/100/1000 [or however many you retrieve at once] in some array structure until you process them.
Upvotes: 1
Reputation: 25178
related but unrelated:
while (wordList.next()) {
stmt = conn.prepareStatement("select synonym from table where word=?");
stmt.setString(1, wordList.getString(1));
syns = stmt.executeQuery();
process(syns, wordList.getString(1));
}
You should move that preparestatement call outside the loop:
stmt = conn.prepareStatement("select synonym from table where word=?");
while (wordList.next()) {
stmt.setString(1, wordList.getString(1));
syns = stmt.executeQuery();
process(syns, wordList.getString(1));
}
The whole point of preparing a statement is for the db to compile/cache/etc because you're going to use the statement repeatedly. You also may need to clean up your result sets explicitly if you're going to do that many queries, to ensure that you don't run out of cursors.
Upvotes: 1
Reputation: 43
The problem is solved. The important point is, that the table can be sorted by word. Therefore, I can easily iterate through the whole table. Like this:
....
Statement stmt;
ResultSet rs;
String currentWord;
HashSet<String> syns = new HashSet<String>();
...
stmt = conn.createStatement();
rs = stmt.executeQuery(select word, synonym from table order by word);
rs.next();
currentWord = rs.getString(1);
syns.add(rs.getString(2));
while (rs.next()) {
if (rs.getString(1) != currentWord) {
process(syns, currentWord);
syns.clear();
currentWord = rs.getString(1);
}
syns.add(rs.getString(2));
}
...
Upvotes: 0
Reputation: 58825
PreparedStatement stmt;
ResultSet syns;
...
stmt = conn.prepareStatement("select distinct " +
" sy.synonm " +
"from " +
" table sy " +
" table wd " +
"where sy.word = wd.word");
syns = stmt.executeQuery();
process(syns);
Upvotes: 1
Reputation: 61434
Two ideas:
a) How about making it one query:
select synonym from table where word in (select distinct word from table)
b) Or, if you process
method needs to deal with them as a set of synonyms of one word, why not sort them by word
and start process
anew each time word
is different? That query would be:
select word, synonym
from table
order by word
Upvotes: 4
Reputation: 41690
Why are you querying the synonyms inside the loop if you're querying all of them anyway? You should use a single select word, synonym from table order by word
, and then split by words in the Java code.
Upvotes: 3
Reputation: 17556
Ensure that there is an index on the 'word' column.
Move the second prepareStatement outside the word loop. Each time you create a new statement, the database compiles and optimizes the query - but in this case the query is the same, so this is unnecessary.
Combine the statements as sblundy above has done.
Upvotes: 5