Reputation: 133
I have a dataset in Oracle 12C which shows the path a user clicked through a website. The path is a delimited string, but I need to break the individual steps out in to multiple rows with each of the FROM and TO steps shown as two separate columns across multiple rows grouped by the original path, user & date key.
With the following sample dataset:
SELECT 'USER_A', '2020-08-07', '|A|B|C' FROM DUAL
UNION
SELECT 'USER_B', '2020-08-07', '|G|H|I|J|K' FROM DUAL
UNION
SELECT 'USER_B', '2020-08-06', '|A|B|C' FROM DUAL
The input data looks like this:
User Date WebPath
USER_A 2020-08-07 |A|B|C
USER_B 2020-08-06 |A|B|C
USER_B 2020-08-07 |G|H|I|J|K
So on 2020-08-07, User A went from point A to point B and then went from point B on to point C.
I need the ouput to look like the output below
USER DATE STARTPOINT ENDPOINT WEBPATH
USER_A 2020-08-07 A B |A|B|C
USER_A 2020-08-07 B C |A|B|C
USER_B 2020-08-06 A B |A|B|C
USER_B 2020-08-06 B C |A|B|C
USER_B 2020-08-07 G H |G|H|I|J|K
USER_B 2020-08-07 H I |G|H|I|J|K
USER_B 2020-08-07 I J |G|H|I|J|K
USER_B 2020-08-07 J K |G|H|I|J|K
Upvotes: 0
Views: 59
Reputation: 191560
You could use plain recursive subquery factoring:
with rcte (usr, dt, path, pos, startpoint, endpoint) as (
select usr, dt, path, 1,
regexp_substr(path, '(.*?)(\||$)', 1, 2, null, 1),
regexp_substr(path, '(.*?)(\||$)', 1, 3, null, 1)
from your_table
union all
select usr, dt, path, pos + 1,
regexp_substr(path, '(.*?)(\||$)', 1, pos + 2, null, 1),
regexp_substr(path, '(.*?)(\||$)', 1, pos + 3, null, 1)
from rcte
where regexp_substr(path, '(.*?)(\||$)', 1, pos + 3, null, 1) is not null
)
select usr, dt, startpoint, endpoint, path
from rcte
order by usr, dt, pos
which with your data gets:
USR | DT | STARTPOINT | ENDPOINT | PATH
:----- | :-------- | :--------- | :------- | :---------
USER_A | 07-AUG-20 | A | B | |A|B|C
USER_A | 07-AUG-20 | B | C | |A|B|C
USER_B | 06-AUG-20 | A | B | |A|B|C
USER_B | 06-AUG-20 | B | C | |A|B|C
USER_B | 07-AUG-20 | G | H | |G|H|I|J|K
USER_B | 07-AUG-20 | H | I | |G|H|I|J|K
USER_B | 07-AUG-20 | I | J | |G|H|I|J|K
USER_B | 07-AUG-20 | J | K | |G|H|I|J|K
Upvotes: 1
Reputation: 7937
My Oracle skills are rusty, but here's the idea which might be a little too hacky:
duplicator
returns numbers point_idx = 1, 2, ..., max_path_size
where max_path_size
found from count of |
characters in webpath
.
In the main query you do a Cartesian product of input data with duplicator, and so you get each row multiple times with different point_idx
values from 1 to max_path_size
. Use this to get a current (point_idx
th) substring in a webpath
and the next (point_idx+ 1
th) substring to print.
WITH
input (user, date, webpath) AS (
SELECT 'USER_A', '2020-08-07', '|A|B|C' FROM DUAL UNION
SELECT 'USER_B', '2020-08-07', '|G|H|I|J|K' FROM DUAL UNION
SELECT 'USER_B', '2020-08-06', '|A|B|C' FROM DUAL
),
duplicator(point_idx) AS (
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL <= (
SELECT MAX(REGEXP_COUNT(webpath, '|')) m
FROM input
)
)
SELECT
input.*,
REGEXP_SUBSTR(webpath, '(\|)([^|]*)', 1, point_idx , 'ix', 1)) AS startpoint,
REGEXP_SUBSTR(webpath, '(\|)([^|]*)', 1, point_idx + 1, 'ix', 1)) AS endpoint
FROM input, duplicator
WHERE point_idx <= REGEXP_COUNT(webpath, '|');
Upvotes: 0
Reputation: 8665
You can use lateral with recursive subquery + regexp_count + regexp_substr:
select *
from t
,lateral(
select
level n
, regexp_substr(t.WebPath, '\|([^|]+)',1,level ,null,1) s1
, regexp_substr(t.WebPath, '\|([^|]+)',1,level+1,null,1) s2
from dual
connect by level<regexp_count(WebPath,'\|[^|]+')
) x
Full example:
with t(UserName, DateCol, WebPath) as (
SELECT 'USER_A', '2020-08-07', '|A|B|C' FROM DUAL
UNION
SELECT 'USER_B', '2020-08-07', '|G|H|I|J|K' FROM DUAL
UNION
SELECT 'USER_B', '2020-08-06', '|A|B|C' FROM DUAL
)
select *
from t
,lateral(
select
level n
, regexp_substr(t.WebPath, '\|([^|]+)',1,level ,null,1) s1
, regexp_substr(t.WebPath, '\|([^|]+)',1,level+1,null,1) s2
from dual
connect by level<regexp_count(WebPath,'\|[^|]+')
) x;
Result:
USERNAME DATECOL WEBPATH N S1 S2
---------- ---------- ---------- ---------- ------ ------
USER_A 2020-08-07 |A|B|C 1 A B
USER_A 2020-08-07 |A|B|C 2 B C
USER_B 2020-08-06 |A|B|C 1 A B
USER_B 2020-08-06 |A|B|C 2 B C
USER_B 2020-08-07 |G|H|I|J|K 1 G H
USER_B 2020-08-07 |G|H|I|J|K 2 H I
USER_B 2020-08-07 |G|H|I|J|K 3 I J
USER_B 2020-08-07 |G|H|I|J|K 4 J K
8 rows selected.
Upvotes: 2