nwinkler
nwinkler

Reputation: 54457

Tuning Jackrabbit data model (VERSION_BUNDLE table)

As part of our application, we are using Jackrabbit (1.6.4) to store documents. Each document that is retrieved by our application is put into a folder structure in Jackrabbit, which is created if not existing.

Our DBA has noticed that the following query is executed a lot against the Oracle (11.2.0.2.0) database holding the Jackrabbit schema - more than 50000 times per hour, causing a lot of IO on the database. In fact, it is one of the top 5 SQL statements in terms of IO over elapsed time (97% IO):

    select BUNDLE_DATA from VERSION_BUNDLE where NODE_ID = :1

Taking a look at the database, one notices that this table initially only contains a single record, comprising the node_id (data type RAW) key with the DEADBEEFFACEBABECAFEBABECAFEBABE value and then a couple of bytes in the bundle_data BLOB column. Later on, more records are added with additional data.

The SQL for the table looks like this:

CREATE TABLE "VERSION_BUNDLE"
(    
    "NODE_ID" RAW(16) NOT NULL ENABLE,
    "BUNDLE_DATA" BLOB NOT NULL ENABLE
);

I have the following questions:

Update: The table only initially contains one record, additional records are added over time as decided internally by Jackrabbit. The access still seems to be read-only for most of the cases, as insert or update statements are not reported as being run with a high number.

Upvotes: 2

Views: 1863

Answers (4)

Gamby
Gamby

Reputation: 592

Why is Jackrabbit accessing this table so frequently?

We have seen that this table is accessed very often even if you are not asking for versions. Take a look to this thread from Jackrabbit users mailing list

Upvotes: 1

Robert Munteanu
Robert Munteanu

Reputation: 68288

Why is Jackrabbit accessing this table so frequently?

Then it's a sign that you're creating versions in your repository. Is that something which your application is supposed to do?

Any Jackrabbit tuning options to make this faster?

Not that I'm aware of ; one option to investigate is to upgrade to a more recent Jackrabbit version. Version 2.4.2 was just released and 1.6.4 is almost two years old. It's a possibility that there were performance improvements between these releases.

Is the BUNDLE_DATA value changed by Jackrabbit at all or is it just read for every access to the repository?

By the looks of it's the GUID of the root repository node.

Is there any way to tune the database schema to make it deal better with this scenario?

As far as I know the schema is auto-generated by Jackrabbit so the only options are to modify the table definition in a compatible way after it's been created. But that's a topic for a DBA, which I am not.

Upvotes: 1

Tobias
Tobias

Reputation: 11

If the JCR-Store is based within a Oracle database you could reorganize the underlying table.

  1. Build a hash-cluster of that table to prevent index accesses
  2. Check if you've licenses to use partitioning option
  3. By deleting unnecessary versions in your application rows will got deleted (Version prune)

If you're storing binary objects like pictures, documents - just have also a look at VERSION_BINVAL.

Upvotes: 1

David Aldridge
David Aldridge

Reputation: 52386

Is this physical i/o or logical? With the data being read that often I'd be surprised if the blocks are being aged out of the cache fast enough for physical i/o to be required.

Upvotes: 2

Related Questions