oracle sql split text into columns based on each occurrence of a certain character set

Question

In our database (Oracle), there is a field named CONVERSATION containing speech to text records (formatted as CLOB). After some pre-processing and replacement of unnecessary characters, currently this field has a format as the example below.

I want to split texts of agents and customers into separate columns. And I want them separeted by comma for each part starts with "a:" or "c:". How can I do that?

"a:" stands for agent and "c:" stands for customer

CREATE TABLE TEXT_RECORDS (
    CONVERSATION CLOB
    );

INSERT INTO TEXT_RECORDS
(CONVERSATION)
VALUES
('a:some text 1 c:some text 2 a:some text 3 c:some text 4 a:some text 5 c:some text 6');

--EDITED (previously it was 'a:some_text_1 c:some_text_2 a:some_text_3 c:some_text_4 a:some_text_5 c:some_text_6')

Desired output as two separate fields:

CONV_AGENT                              CONV_CUSTOMER
some text 1 ,some text 3, some text 5   some text 2 ,some text 4, some text 6

MT0 · Accepted Answer

You can just remove the sub-strings which do not have the correct prefix:

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE TEXT_RECORDS (
    CONVERSATION CLOB
    );

INSERT INTO TEXT_RECORDS(CONVERSATION)
SELECT 'a:some_text_1 c:some_text_2 a:some_text_3 c:some_text_4 a:some_text_5 c:some_text_6' FROM DUAL UNION ALL
SELECT 'a:some_text_1 a:some_text_2 a:some_text_3' FROM DUAL UNION ALL
SELECT 'c:some_text_1 a:some_text_2 a:some_text_3 c:some_text_4' FROM DUAL;

Query 1:

SELECT REGEXP_REPLACE(
         REGEXP_REPLACE(
           REGEXP_REPLACE(
             conversation,
             '.*?(a:(\S+))?(\s|$)',  -- Find each word starting with "a:"
             '\2, '                  -- replace with just that part without prefix
           ),
           '(, ){2,}', -- Replace multiple delimiters
           ', '        -- With a single delimiter
         ),
         '^, |, $'     -- Remove leading and trailing delimiters
       ) AS conv_agent,
       REGEXP_REPLACE(
         REGEXP_REPLACE(
           REGEXP_REPLACE(
             conversation,
             '.*?(c:(\S+))?(\s|$)',  -- Find each word starting with "c:"
             '\2, '                  -- replace with just that part without prefix
           ),
           '(, ){2,}', -- Replace multiple delimiters
           ', '        -- With a single delimiter
         ),
         '^, |, $'     -- Remove leading and trailing delimiters
       ) AS conv_customer
FROM   text_records

Results:

|                            CONV_AGENT |                         CONV_CUSTOMER |
|---------------------------------------|---------------------------------------|
| some_text_1, some_text_3, some_text_5 | some_text_2, some_text_4, some_text_6 |
| some_text_1, some_text_2, some_text_3 |                                       |
|              some_text_2, some_text_3 |              some_text_1, some_text_4 |

Updated - Spaces in conversation sentences

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE TEXT_RECORDS (
    CONVERSATION CLOB
    );

INSERT INTO TEXT_RECORDS(CONVERSATION)
SELECT 'a:some text 1 c:some text 2 a:some text 3 c:some text 4 a:some text 5 c:some text 6' FROM DUAL UNION ALL
SELECT 'a:some text 1 a:some text 2 a:some text 3' FROM DUAL UNION ALL
SELECT 'c:some text 1 a:some text 2 a:some text 3 c:some text 4' FROM DUAL;

Query 1:

SELECT REGEXP_REPLACE(
         REGEXP_REPLACE(
           REGEXP_REPLACE(
             conversation,
             '.*?(a:([^:]*))?(\s|$)',
             '\2, '
           ),
           '(, ){2,}',
           ', '
         ),
         '^, |, $'
       ) AS conv_agent,
       REGEXP_REPLACE(
         REGEXP_REPLACE(
           REGEXP_REPLACE(
             conversation,
             '.*?(c:([^:]*))?(\s|$)',
             '\2, '
           ),
           '(, ){2,}',
           ', '
         ),
         '^, |, $'
       ) AS conv_customer
FROM   text_records

Results:

|                            CONV_AGENT |                         CONV_CUSTOMER |
|---------------------------------------|---------------------------------------|
| some text 1, some text 3, some text 5 | some text 2, some text 4, some text 6 |
| some text 1, some text 2, some text 3 |                                       |
|              some text 2, some text 3 |              some text 1, some text 4 |

oracle sql split text into columns based on each occurrence of a certain character set

Answers (2)

Related Questions