I Z
I Z

Reputation: 5927

How to extract substrings from strings in Oracle with PL/SQL?

I am new to SQL so this question is probably fairly basic, yet I've spent quite a bit of time on it without success.

I was given a block of SQL code that I need to modify. This block currently extracts all tokens ("words") from a text index. What I need to do is to loop over each token and (1) extract all char N-grams of a given length N from it and then (2) insert them into a table.

So if a token is abcdf and N=3 then I need to extract abc, bcd, and cdf

I am struggling with (1).

I am doing it in Oracle's PL/SQL: Release 11.2.0.3.0

What I have now is this:

CREATE OR REPLACE PROCEDURE extract_ngrams
AS
     tokens ctx_doc.token_tab;
     w VARCHAR2(255);
     NgramLen NUMBER;
     ngr VARCHAR(10);
BEGIN

-- extract all tokens ("words")
CTX_DOC.TOKENS(myindex,textkey,tokens);

NgramLen := 3;

FOR i IN tokens.FIRST..tokens.LAST LOOP
  w := tokens(i).token;
  dbms_output.put_line(w);
  -- Use SUBSTRING + LEN or some other approach to extract Ngrams of length=NgramLen
  -- insert each Ngram into a table
end loop;

END;
/

What is the right way to combine FOR loop with SUBSTRING, LEN and/or whatever other appropriate functions to extract ngrams into ngr in a loop in a way similar to how tokens are extracted into w?

Thx!

Upvotes: 1

Views: 1827

Answers (1)

Rida BENHAMMANE
Rida BENHAMMANE

Reputation: 4129

Here an example of what you want :

FOR idx IN 1..(LENGTH(w)-NgramLen+1)
  LOOP
    -- do what you want here
    DBMS_OUTPUT.PUT_LINE(SUBSTR(w,idx,NgramLen));
  END LOOP;

and don't forget to declare idx NUMBER;

Upvotes: 1

Related Questions