Reputation: 5927
I am new to SQL so this question is probably fairly basic, yet I've spent quite a bit of time on it without success.
I was given a block of SQL code that I need to modify. This block currently extracts all tokens ("words") from a text index. What I need to do is to loop over each token and (1) extract all char N-grams of a given length N
from it and then (2) insert them into a table.
So if a token is abcdf
and N=3
then I need to extract abc
, bcd
, and cdf
I am struggling with (1).
I am doing it in Oracle's PL/SQL: Release 11.2.0.3.0
What I have now is this:
CREATE OR REPLACE PROCEDURE extract_ngrams
AS
tokens ctx_doc.token_tab;
w VARCHAR2(255);
NgramLen NUMBER;
ngr VARCHAR(10);
BEGIN
-- extract all tokens ("words")
CTX_DOC.TOKENS(myindex,textkey,tokens);
NgramLen := 3;
FOR i IN tokens.FIRST..tokens.LAST LOOP
w := tokens(i).token;
dbms_output.put_line(w);
-- Use SUBSTRING + LEN or some other approach to extract Ngrams of length=NgramLen
-- insert each Ngram into a table
end loop;
END;
/
What is the right way to combine FOR
loop with SUBSTRING
, LEN
and/or whatever other appropriate functions to extract ngrams into ngr
in a loop in a way similar to how tokens are extracted into w
?
Thx!
Upvotes: 1
Views: 1827
Reputation: 4129
Here an example of what you want :
FOR idx IN 1..(LENGTH(w)-NgramLen+1)
LOOP
-- do what you want here
DBMS_OUTPUT.PUT_LINE(SUBSTR(w,idx,NgramLen));
END LOOP;
and don't forget to declare idx NUMBER;
Upvotes: 1