Peter Walser
Peter Walser

Reputation: 15706

Oracle text index on table with conditional data from another table

Situation

I'm storing versioned files in an Oracle database. Each file has a number of versions (each having a distinct version number, which gets increased with each new version), and each version points to a binary data record, constituting the file version's content.

Tables (simplified):

STORAGE_FILE
- ID    NUMBER(19)

STORAGE_FILE_VERSION
- STORAGE_FILE_ID   NUMBER(19) -> references STORAGE_FILE(ID)
- DATA_ID   NUMBER(19) -> references BINARY_DATA(ID)

BINARY_DATA
- ID    NUMBER(19)
- DATA  BLOB

Problem

To search within the files, I want to create an Oracle text index, but only on the latest version of each file. The index should belong to the STORAGE_FILE, containing the indexed referenced BINARY_DATA.data from the STORAGE_FILE_VERSION with the latest (biggest) version.

Using the following query (for which I also created a View), I can determine the latest version and data for each file:

select latest_file_version.ID, latest_file_version.VERSION, b.data from (
  select f.ID, max(v.VERSION) as version from STORAGE_FILE f
  join STORAGE_FILE_VERSION v on v.STORAGE_FILE_ID = f.ID
  group by f.ID
) latest_file_version
join STORAGE_FILE_VERSION v 
    on v.STORAGE_FILE_ID = latest_file_version.ID and v.VERSION = latest_file_version.VERSION
join BINARY_DATA b on b.ID = v.DATA_ID

Oracle offers several datastore types for text indexes, but I'm not sure which one would be appropriate for my scenario:

The other data stores don't seem to fit this problem either.
Any advice on how to realize such an index without falling back to materialized views?

Upvotes: 2

Views: 337

Answers (1)

Peter Walser
Peter Walser

Reputation: 15706

The solution with the USER_DATASTORE turned out to work really well:

First, we need an indexing column on the STORAGE_FILE (to flag records for indexing, and to run Oracle text queries against it):

alter table STORAGE_FILE add INDEX_FLAG char(1) default 'Y';

A procedure is needed to provide the blob for the index. Here we want to determine the latest file version using the view (query as in question):

create or replace procedure STORAGE_FILE_DATA 
(rid in rowid, tlob in out NOCOPY blob ) is
begin
  begin
    select data.binary_data into tlob from STORAGE_FILE f
    left outer join VW_STORAGE_FILE_DATA data on f.id=data.file_id
    where f.rowid = rid and f.INDEX_FLAG='Y';
  exception
    when NO_DATA_FOUND then tlob := null;
  end;
end;
/

Next, we create the index (and preferences) which employs the procedure above:

begin
  ctx_ddl.create_preference('FILE_DATA_DATASTORE','user_datastore');
  ctx_ddl.set_attribute('FILE_DATA_DATASTORE','procedure','STORAGE_FILE_DATA');
  ctx_ddl.set_attribute('FILE_DATA_DATASTORE','output_type','blob_loc');
end;
/

create index INDEX_FILE_DATA
on STORAGE_FILE (INDEX_FLAG)
indextype is ctxsys.context
parameters ('filter ctxsys.auto_filter
             datastore FILE_DATA_DATASTORE
             memory 200M
             transactional')
/

Also, we need a trigger to flag entries for re-indexing once they are changed:

create or replace trigger TRIGGER_INDEX_FILE
before update on STORAGE_FILE
for each row
begin
  :new.INDEX_FLAG := 'Y';
end;
/

And we're done. Now we can run full-text queries on the lastest file versions' data only:

select * from STORAGE_FILE f
where contains (f.INDEX_FLAG, 'Hello AND World', 1) > 0;

Upvotes: 1

Related Questions