Reputation: 15706
I'm storing versioned files in an Oracle database. Each file has a number of versions (each having a distinct version number, which gets increased with each new version), and each version points to a binary data record, constituting the file version's content.
Tables (simplified):
STORAGE_FILE
- ID NUMBER(19)
STORAGE_FILE_VERSION
- STORAGE_FILE_ID NUMBER(19) -> references STORAGE_FILE(ID)
- DATA_ID NUMBER(19) -> references BINARY_DATA(ID)
BINARY_DATA
- ID NUMBER(19)
- DATA BLOB
To search within the files, I want to create an Oracle text index, but only on the latest version of each file.
The index should belong to the STORAGE_FILE
, containing the indexed referenced BINARY_DATA.data
from the STORAGE_FILE_VERSION
with the latest (biggest) version.
Using the following query (for which I also created a View), I can determine the latest version and data for each file:
select latest_file_version.ID, latest_file_version.VERSION, b.data from (
select f.ID, max(v.VERSION) as version from STORAGE_FILE f
join STORAGE_FILE_VERSION v on v.STORAGE_FILE_ID = f.ID
group by f.ID
) latest_file_version
join STORAGE_FILE_VERSION v
on v.STORAGE_FILE_ID = latest_file_version.ID and v.VERSION = latest_file_version.VERSION
join BINARY_DATA b on b.ID = v.DATA_ID
Oracle offers several datastore types for text indexes, but I'm not sure which one would be appropriate for my scenario:
The other data stores don't seem to fit this problem either.
Any advice on how to realize such an index without falling back to materialized views?
Upvotes: 2
Views: 337
Reputation: 15706
The solution with the USER_DATASTORE
turned out to work really well:
First, we need an indexing column on the STORAGE_FILE
(to flag records for indexing, and to run Oracle text queries against it):
alter table STORAGE_FILE add INDEX_FLAG char(1) default 'Y';
A procedure is needed to provide the blob for the index. Here we want to determine the latest file version using the view (query as in question):
create or replace procedure STORAGE_FILE_DATA
(rid in rowid, tlob in out NOCOPY blob ) is
begin
begin
select data.binary_data into tlob from STORAGE_FILE f
left outer join VW_STORAGE_FILE_DATA data on f.id=data.file_id
where f.rowid = rid and f.INDEX_FLAG='Y';
exception
when NO_DATA_FOUND then tlob := null;
end;
end;
/
Next, we create the index (and preferences) which employs the procedure above:
begin
ctx_ddl.create_preference('FILE_DATA_DATASTORE','user_datastore');
ctx_ddl.set_attribute('FILE_DATA_DATASTORE','procedure','STORAGE_FILE_DATA');
ctx_ddl.set_attribute('FILE_DATA_DATASTORE','output_type','blob_loc');
end;
/
create index INDEX_FILE_DATA
on STORAGE_FILE (INDEX_FLAG)
indextype is ctxsys.context
parameters ('filter ctxsys.auto_filter
datastore FILE_DATA_DATASTORE
memory 200M
transactional')
/
Also, we need a trigger to flag entries for re-indexing once they are changed:
create or replace trigger TRIGGER_INDEX_FILE
before update on STORAGE_FILE
for each row
begin
:new.INDEX_FLAG := 'Y';
end;
/
And we're done. Now we can run full-text queries on the lastest file versions' data only:
select * from STORAGE_FILE f
where contains (f.INDEX_FLAG, 'Hello AND World', 1) > 0;
Upvotes: 1