Reputation: 4683
I've got this requirement to iterate over millions of records returned in resultset and process them and store them in some data structure. I have not been able to find any relevant example or reference for the same. JOOQ seems to be doing something that I may want but it seems it is not free. I was hoping if using java 8 streams I might be able to achieve it, but not example or writeup seems to give me any directions to head towards. I am open to other alternatives as well.
Based on this SO reference: resultset parallel I did attempted below but it did not give me any performance improvement as could be seen below in the performance metrics.
CODE:
Sequential Iteration:
while(rs.next()) {
System.out.println(rs.getString(1));
}
Using streams and spliterator:
Stream<String> s = StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(Long.MAX_VALUE,
Spliterator.ORDERED) {
@Override
public boolean tryAdvance(Consumer<? super String> action) {
try {
if (!rs.next())
return false;
action.accept(rs.getString(1));
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return true;
}
}, true);
s.forEach(System.out::println);
Total number of records: 3759
Time taken by Sequential: ~ 83.8 secs
Time taken by Streams: ~ 83.5 secs
Can anyone review this and tell me if I've not implemented streams correctly.
Upvotes: 4
Views: 7619
Reputation: 298469
A ResultSet
can not be processed in parallel. It is a kind of iterator, incorporating mutable state that must be altered for a query, most notably, the ResultSet
has a current row to which it must be moved, before it can read. Even for the values within the row, which are accessed by index, the specification makes no thread safety guarantees and mentions the possibility that reading them out of order might not be supported by the underlying database.
So, the only operations that could benefit from parallel processing are chained subsequent operations, but when the only chained operation is a System.out::println
, you make it worse. Not only does the print operation not benefit from parallel processing, all standard implementations have a PrintStream
in System.out
that will synchronize every write operation to the target output.
Note that even if you chain computational intense operations, which might benefit from parallel processing, it is still possible that the expensive database operation dominates the overall execution time. That’s why it is important to let the database filter and/or aggregate data as much as possible, before transferring them to the Java side…
Upvotes: 8