Reputation: 546
I have been through the below article:
https://community.oracle.com/tech/developers/discussion/2179837/searching-a-string-in-a-pdf-blob
It describes how to upload a document such as a pdf or word doc as a blob into a database table and then using oracle text, search for a string in that document to tell you the document that the string is in.
I cannot find a means by which you can for example search for a string in a word document and then return the paragraph, or some text that includes that string within the document.
What if you uploaded a hundred word documents,and you wanted to use oracle text to search for any that included a string? Does Oracle Text let you reference not just the document but the paragraph your string is in?
Help appreciated, I can find no examples of this unfortunately.
Upvotes: 1
Views: 450
Reputation: 1760
You can use the CTX_DOC package for this. I will show you a short example on how to use it.
First create the table:
CREATE TABLE x_files
(
id_ INTEGER,
f_name VARCHAR2(30),
p_file BLOB,
FORMAT VARCHAR2(30),
CONSTRAINT id_pk PRIMARY KEY (id_)
);
Next insert some PDFs, DOCs, TXTs and so on, afterwards create the index as:
CREATE INDEX i_x_files ON x_files(p_file) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS
('DATASTORE CTXSYS.DEFAULT_DATASTORE FILTER CTXSYS.AUTO_FILTER FORMAT COLUMN FORMAT');
Finally check this example PL/SQL anonymous procedure:
SET SERVEROUTPUT ON;
DECLARE
v_query VARCHAR2(255) := 'your-query';
t_restab ctx_doc.highlight_tab;
BEGIN
FOR r_pdf IN (SELECT id_, p_file FROM x_files WHERE contains(p_file,v_query) > 0) LOOP
ctx_doc.highlight -- ctx_doc.markup is also very useful
(
index_name => 'i_x_files',
textkey => r_pdf.id_,
text_query => v_query,
restab => t_restab,
plaintext => FALSE
);
FOR I IN t_restab.FIRST..t_restab.LAST LOOP
dbms_output.put_line(
utl_raw.cast_to_varchar2(
dbms_lob.substr(r_pdf.p_file, t_restab(I).OFFSET, t_restab(I).LENGTH)
)
);
END LOOP;
dbms_output.put_line('');
END LOOP;
END;
/
Upvotes: 0