Reputation: 44740
I need to update every row of a table having more then 60k rows. Currently I'm doing it like this:
public void updateRank(Map<Integer, Double> map) {
Iterator<Map.Entry<Integer, Double>> it = map.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<Integer, Double> pairs = (Map.Entry<Integer, Double>) it.next();
String query = "update profile set rank = " + pairs.getValue()
+ " where profileId = " + pairs.getKey();
DBUtil.update(query);
it.remove();
}
}
This method alone took around 20+ mins to complete, hitting the database for each row(60k) is what i think the reason here.(though i'm using dbcp for connecton pooling, with 50 maximum active connections)
It'd be great if i'd be able to update rows with single database hit. Is that Possible ? How ?
Or any other way to improve timing here ?
Upvotes: 8
Views: 17762
Reputation: 8263
You could concatenate your queries (separate them by a ;
) and send only batches of 100 queries for example:
public void updateRank(Map<Integer, Double> map) {
Iterator<Map.Entry<Integer, Double>> it = map.entrySet().iterator();
String queries = "";
int i = 0;
while (it.hasNext()) {
Map.Entry<Integer, Double> pairs =
(Map.Entry<Integer, Double>) it.next();
queries += "update profile set rank = " + pairs.getValue()
+ " where profileId = " + pairs.getKey() + ";";
it.remove();
if (i++ % 100 == 99) {
DBUtil.update(queries);
queries = "";
}
}
}
Upvotes: 2
Reputation: 3956
If every row should get a different value that cannot be derived from the existing data in the database, there is not much you can do to optimize the overall complexity. So do not expect too much wonders.
That said, you should start using prepared statements and batching:
public void updateRank(Map<Integer,Double> map){
Iterator<Entry<Integer, Double>> it = map.entrySet().iterator();
String query = "";
int i = 0;
Connection connection = getConnection(); // get the DB connection from somewhere
PreparedStatement stmt = connection.prepareStatement("update profile set rank = ? where profileId = ?");
while (it.hasNext()) {
Map.Entry<Integer,Double> pairs = (Map.Entry<Integer,Double>)it.next();
stmt.setInt(1, pairs.getValue());
stmt.setDouble(2, pairs.getKey());
stmt.addBatch(); // this will just collect the data values
it.remove();
}
stmt.executeBatch(); // this will actually execute the updates all in one
}
What this does:
In addition:
profileId
is using an index so that looking up the respective row is fast enoughUpvotes: 17
Reputation: 44969
Right now you execute each query independently which causes a huge connection overhead (even when using connection pooling). Instead use a batch mechanism to execute several queries together.
Using JDBC (which DBCP apparently is using) and prepared statements, you can do this very easily by using addBatch()
and executeBatch()
. I recently had to do this my own and a batch size of around 1000 queries was the fastest. Though this may be entirely different in your case.
References
Upvotes: 3