Cameron Castillo
Cameron Castillo

Reputation: 2842

Oracle: Split many strings into separate words

I'm using Oracle 11g, and I would like to split a column (JobDescription) from the Persons table into separate words.
I.e. If Person A's Job Description is "Professional StackOverflow Contributor", I would like to populate another table with 3 rows containing the 3 words from the Job Description.

From another post here, I managed to get the following which is working for smaller sets of data. But my table contains just less that 500 000 records and the statement has now been running for 2 days and it's still going.

INSERT INTO WORDS (PersonID, Department, Word)
SELECT distinct PersonID, Department, trim(regexp_substr(str, '[^,]+', 1, level)) 
FROM (SELECT PersonID, Department, trim(Replace(JobDescription, ' ', ',')) str 
      FROM Persons) t
CONNECT BY instr(  str  , ',', 1, level - 1) > 0;

Are there another option that might result in quicker results?

Upvotes: 1

Views: 3839

Answers (1)

Chris
Chris

Reputation: 4231

For a one-off job, I see no reason not to go procedural. This should be quick enough (250 seconds for a 2.5 million row table on my system). Change the size of the varchar2 variables if your words can be longer than 40 characters.

create or replace procedure tmp_split_job as
  TYPE wtype IS TABLE OF NUMBER INDEX BY VARCHAR2(40);
  uwords wtype;
  w varchar2(40);
  i pls_integer;
  n pls_integer;
  p pls_integer;
  cursor c_fetch is select PersonID, Department, JobDescription from Persons where JobDescription is not null;
begin
  for v_row in c_fetch loop
    n := length(v_row.JobDescription);
    i := 1;
    while i <= n loop
      p := instr(v_row.JobDescription, ' ', i);
      if p > 1 then
        w := substr(v_row.JobDescription, i, p-i);
        i := p + 1;
      else
        w := substr(v_row.JobDescription, i);
        i := n + 1;
      end if;
      uwords(w) := 1;
    end loop;
    w := uwords.FIRST;
    while w is not null loop
      insert into words (PersonID, Department, Word) values (v_row.PersonID, v_row.Department, w);
      w := uwords.next(w);
    end loop;
    uwords.DELETE;
  end loop;
end;
/

exec tmp_split_job;
drop procedure tmp_split_job;

Upvotes: 0

Related Questions