suchafunkymonkey
suchafunkymonkey

Reputation: 133

Splitting a delimited string in to multiple columns and rows

I have a dataset in Oracle 12C which shows the path a user clicked through a website. The path is a delimited string, but I need to break the individual steps out in to multiple rows with each of the FROM and TO steps shown as two separate columns across multiple rows grouped by the original path, user & date key.

With the following sample dataset:

SELECT 'USER_A', '2020-08-07', '|A|B|C' FROM DUAL
UNION
SELECT 'USER_B', '2020-08-07', '|G|H|I|J|K' FROM DUAL
UNION
SELECT 'USER_B', '2020-08-06', '|A|B|C' FROM DUAL

The input data looks like this:

User    Date        WebPath
USER_A  2020-08-07  |A|B|C
USER_B  2020-08-06  |A|B|C
USER_B  2020-08-07  |G|H|I|J|K

So on 2020-08-07, User A went from point A to point B and then went from point B on to point C.

I need the ouput to look like the output below

USER    DATE        STARTPOINT  ENDPOINT     WEBPATH
USER_A  2020-08-07  A           B            |A|B|C
USER_A  2020-08-07  B           C            |A|B|C   
USER_B  2020-08-06  A           B            |A|B|C
USER_B  2020-08-06  B           C            |A|B|C
USER_B  2020-08-07  G           H            |G|H|I|J|K 
USER_B  2020-08-07  H           I            |G|H|I|J|K
USER_B  2020-08-07  I           J            |G|H|I|J|K
USER_B  2020-08-07  J           K            |G|H|I|J|K

Upvotes: 0

Views: 59

Answers (3)

Alex Poole
Alex Poole

Reputation: 191560

You could use plain recursive subquery factoring:

with rcte (usr, dt, path, pos, startpoint, endpoint) as (
  select usr, dt, path, 1,
    regexp_substr(path, '(.*?)(\||$)', 1, 2, null, 1),
    regexp_substr(path, '(.*?)(\||$)', 1, 3, null, 1)
  from your_table
  union all
  select usr, dt, path, pos + 1,
    regexp_substr(path, '(.*?)(\||$)', 1, pos + 2, null, 1),
    regexp_substr(path, '(.*?)(\||$)', 1, pos + 3, null, 1)
  from rcte
  where regexp_substr(path, '(.*?)(\||$)', 1, pos + 3, null, 1) is not null
)
select usr, dt, startpoint, endpoint, path
from rcte
order by usr, dt, pos

which with your data gets:

USR    | DT        | STARTPOINT | ENDPOINT | PATH      
:----- | :-------- | :--------- | :------- | :---------
USER_A | 07-AUG-20 | A          | B        | |A|B|C    
USER_A | 07-AUG-20 | B          | C        | |A|B|C    
USER_B | 06-AUG-20 | A          | B        | |A|B|C    
USER_B | 06-AUG-20 | B          | C        | |A|B|C    
USER_B | 07-AUG-20 | G          | H        | |G|H|I|J|K
USER_B | 07-AUG-20 | H          | I        | |G|H|I|J|K
USER_B | 07-AUG-20 | I          | J        | |G|H|I|J|K
USER_B | 07-AUG-20 | J          | K        | |G|H|I|J|K

db<>fiddle

Upvotes: 1

Alexey S. Larionov
Alexey S. Larionov

Reputation: 7937

My Oracle skills are rusty, but here's the idea which might be a little too hacky:

duplicator returns numbers point_idx = 1, 2, ..., max_path_size where max_path_size found from count of | characters in webpath.

In the main query you do a Cartesian product of input data with duplicator, and so you get each row multiple times with different point_idx values from 1 to max_path_size. Use this to get a current (point_idxth) substring in a webpath and the next (point_idx+ 1th) substring to print.

WITH
input (user, date, webpath) AS (
  SELECT 'USER_A', '2020-08-07', '|A|B|C' FROM DUAL UNION
  SELECT 'USER_B', '2020-08-07', '|G|H|I|J|K' FROM DUAL UNION
  SELECT 'USER_B', '2020-08-06', '|A|B|C' FROM DUAL
),
duplicator(point_idx) AS (
  SELECT LEVEL
  FROM DUAL 
  CONNECT BY LEVEL <= (
    SELECT MAX(REGEXP_COUNT(webpath, '|')) m
    FROM input
  ) 
)
SELECT 
  input.*, 
  REGEXP_SUBSTR(webpath, '(\|)([^|]*)', 1, point_idx ,    'ix', 1)) AS startpoint, 
  REGEXP_SUBSTR(webpath, '(\|)([^|]*)', 1, point_idx + 1, 'ix', 1)) AS endpoint
FROM input, duplicator
WHERE point_idx <= REGEXP_COUNT(webpath, '|');

Upvotes: 0

Sayan Malakshinov
Sayan Malakshinov

Reputation: 8665

You can use lateral with recursive subquery + regexp_count + regexp_substr:

select *
from t
    ,lateral(
       select
          level n
         , regexp_substr(t.WebPath, '\|([^|]+)',1,level  ,null,1) s1
         , regexp_substr(t.WebPath, '\|([^|]+)',1,level+1,null,1) s2
       from dual
       connect by level<regexp_count(WebPath,'\|[^|]+')
    ) x

Full example:

with t(UserName, DateCol, WebPath) as (
SELECT 'USER_A', '2020-08-07', '|A|B|C' FROM DUAL
UNION
SELECT 'USER_B', '2020-08-07', '|G|H|I|J|K' FROM DUAL
UNION
SELECT 'USER_B', '2020-08-06', '|A|B|C' FROM DUAL
)
select *
from t
    ,lateral(
       select
          level n
         , regexp_substr(t.WebPath, '\|([^|]+)',1,level  ,null,1) s1
         , regexp_substr(t.WebPath, '\|([^|]+)',1,level+1,null,1) s2
       from dual
       connect by level<regexp_count(WebPath,'\|[^|]+')
    ) x;

Result:


USERNAME   DATECOL    WEBPATH             N S1     S2
---------- ---------- ---------- ---------- ------ ------
USER_A     2020-08-07 |A|B|C              1 A      B
USER_A     2020-08-07 |A|B|C              2 B      C
USER_B     2020-08-06 |A|B|C              1 A      B
USER_B     2020-08-06 |A|B|C              2 B      C
USER_B     2020-08-07 |G|H|I|J|K          1 G      H
USER_B     2020-08-07 |G|H|I|J|K          2 H      I
USER_B     2020-08-07 |G|H|I|J|K          3 I      J
USER_B     2020-08-07 |G|H|I|J|K          4 J      K

8 rows selected.

Upvotes: 2

Related Questions