ben_muc
ben_muc

Reputation: 211

CachedRowSet slower than ResultSet?

In my java code, I access an oracle database table with an select statement. I receive a lot of rows (about 50.000 rows), so the rs.next() needs some time to process all of the rows.

using ResultSet, the processing of all rows (rs.next) takes about 30 secs

My goal is to speed up this process, so I changed the code and now using a CachedRowSet:

using CachedRowSet, the processing of all rows takes about 35 secs

I don't understand why the CachedRowSet is slower than the normal ResultSet, because the CachedRowSet retrieves all data at once, while the ResultSet retrieves the data every time the rs.next is called.

Here is a part of the code:

try {
    stmt = masterCon.prepareStatement(sql);
    rs = stmt.executeQuery();

    CachedRowSet crset = new CachedRowSetImpl();
    crset.populate(rs);

    while (rs.next()) {
        int countStar = iterRs.getInt("COUNT");
        ...
    }
} finally {
    //cleanup
}

Upvotes: 3

Views: 8024

Answers (5)

Naor Bar
Naor Bar

Reputation: 2209

I've done some performance tests with/without CachedRowSet and with different fetchSize (using Exasol DB and the latest 7.x.x JDBC driver)

My tests and code are very straight forward;

  • First Try: executing a Select query and populating the CachedRowSet
  • Second Try: executing a Select query and populating a Map using a standard ResultSet

First Try (using a CachedRowSet):

stmt = this.connection.prepareStatement(query);
stmt.setFetchSize(this.fetchSize);
rs = stmt.executeQuery();
CachedRowSet crs = RowSetProvider.newFactory().createCachedRowSet();
crs.populate(rs);
return crs;

Results:

| Table Size | FetchSize | ExecuteQuery Time | CrsPopulate Time | Total Time |
| ---------- | --------- | ----------------- | ---------------- | ---------- |
| 50K items  | 10K       | 441.0ms           | 8124.0ms         | 8565.0ms   |
| 50K items  | 100K      | 394.0ms           | 4252.0ms         | 4646.0ms   |

Second Try (Populating a HashMap rather than a CachedRowSet):

stmt = this.connection.prepareStatement(query);
stmt.setFetchSize(this.fetchSize);
rs = stmt.executeQuery();
Map<Long, String> m = new HashMap<>();
while (rs.next())
    m.put(rs.getLong("id"), rs.getString("name"));
return m;

Results:

| Table Size | FetchSize | ExecuteQuery Time | MapPopulate Time | Total Time |
| ---------- | --------- | ----------------- | ---------------- | ---------- |
| 50K items  | 10K       | 387.0ms           | 1561.0ms         | 1948.0ms   |
| 50K items  | 100K      | 389.0ms           | 1464.0ms         | 1853.0ms   |

So indeed, looks like the CachedRowSet implementation has poor performance comparing to using the ResultSet directly, but it provides a generic approach which allows to use different tables structures.

Disclaimer:

I used only one type of data set (i.e. a table of 50K simple items). I suggest that you test that on your specific use case before deciding on your approach.

Upvotes: 0

zhrist
zhrist

Reputation: 1558

Using normal ResultSet you can get more optimization options with RowPrefetch and FetchSize.

Those optimizes the network transport chunks and processing in the while loop, so the rs.next() has always a data to work with.

FetchSize has a default set to 10(Oracle latest versions), but as I know RowPrefetch is not set. Thus means network transport is not optimized at all.

Upvotes: 0

Dan
Dan

Reputation: 41

There is an issue with CachedRowSet coupled together with a postgres jdbc driver.

CachedRowSet needs to know the types of the columns so it knows which java objects to create (god knows what else it fetches from DB behind the covers!).

It therefor makes more roundtrips to the DB to fetch column metadata. In very high volumes this becomes a real problem. If the DB is on a remote server, this is a real problem as well because of network latency.

We've been using CachedRowSet for years and just discovered this. We now implement our own CachedRowSet, as we never used any of it's fancy stuff anyway. We do getString for all types and convert ourselves as this seems the quickest way.

This clearly wasn't an issue with fetch size as postgres driver fetches everything by default.

Upvotes: 4

Jon Skeet
Jon Skeet

Reputation: 1500225

What makes you think that ResultSet will retrieve the data each time rs.next() is called? It's up to the implementation exactly how it works - and I wouldn't be surprised if it fetches a chunk at a time; quite possibly a fairly large chunk.

I suspect you're basically seeing the time it takes to copy all the data into the CachedRowSet and then access it all - basically you've got an extra copying operation for no purpose.

Upvotes: 3

Tobias
Tobias

Reputation: 9380

CachedRowSet caches the results in memory i.e. that you don't need the connection anymore. Therefore it it "slower" in the first place.

A CachedRowSet object is a container for rows of data that caches its rows in memory, which makes it possible to operate without always being connected to its data source.

-> http://download.oracle.com/javase/1,5.0/docs/api/javax/sql/rowset/CachedRowSet.html

Upvotes: 5

Related Questions