Reputation: 159
I have a column in table that stores data in series. Some of the data is separated using hyphen and some is separated using comma. I want to split the data into rows. The thing is that comma separated values are treated as single values after each comma but for hyphen it means a sort of data range.
For example If the string is something like this 'A1, A2, A4'
, it means there are 3 values and will be transformed into 3 rows. There are also string like 'A1-A4'
which means there are 4 values and will be transformed into 4 rows as hyphen represents range of values stating starting and ending value.
I am able to transform comma separated values but not sure how to split hyphen separated range in oracle.
SELECT regexp_substr('A1,A2,A4' , '[^,]+', 1, level) as a
FROM dual
CONNECT BY regexp_substr('A1,A2,A4', '[^,]+', 1, level) is not null
Above ddl converts the provided string into 3 rows which is fine.
SELECT regexp_substr('A1-A4' , '[^-]+', 1, level) as a
FROM dual
CONNECT BY regexp_substr('A1-A4', '[^-]+', 1, level) is not null
But this above query should return 4 rows but I am not sure how to achieve this. Any thoughts ?
Upvotes: 0
Views: 1113
Reputation: 8655
Full example with extended sample data:
with t(n, str) as (
select 1,'A1, A2, A4' from dual union all
select 2,'B1, B4, B7-B11' from dual union all
select 3,'C1, C3, C5-C7' from dual union all
select 4,'XY1, XT3, ZZ5-ZZ7' from dual
)
select *
from t
,lateral(
select level part_n, regexp_substr(str,'[^ ,]+',1,level) part
from dual
connect by level<=regexp_count(str,'[^ ,]+')
)
,lateral(
select
level sub_part_n,
nvl(
regexp_substr(part,'(\w+)(\d+)[ -]+\1(\d+)',1,1,null,1)
||
(regexp_substr(part,'(\w+)(\d+)[ -]+\1(\d+)',1,1,null,2) + level -1)
,part
)
as subpart
from dual
connect by level<= regexp_substr(part,'(\w+)(\d+)[ -]+\1(\d+)',1,1,null,3)
- regexp_substr(part,'(\w+)(\d+)[ -]+\1(\d+)',1,1,null,2)
+ 1
)
Results:
N STR PART_N PART SUB_PART_N SUBPART
---------- ----------------- ---------- ---------- ---------- ----------
1 A1, A2, A4 1 A1 1 A1
1 A1, A2, A4 2 A2 1 A2
1 A1, A2, A4 3 A4 1 A4
2 B1, B4, B7-B11 1 B1 1 B1
2 B1, B4, B7-B11 2 B4 1 B4
2 B1, B4, B7-B11 3 B7-B11 1 B7
2 B1, B4, B7-B11 3 B7-B11 2 B8
2 B1, B4, B7-B11 3 B7-B11 3 B9
2 B1, B4, B7-B11 3 B7-B11 4 B10
2 B1, B4, B7-B11 3 B7-B11 5 B11
3 C1, C3, C5-C7 1 C1 1 C1
3 C1, C3, C5-C7 2 C3 1 C3
3 C1, C3, C5-C7 3 C5-C7 1 C5
3 C1, C3, C5-C7 3 C5-C7 2 C6
3 C1, C3, C5-C7 3 C5-C7 3 C7
4 XY1, XT3, ZZ5-ZZ7 1 XY1 1 XY1
4 XY1, XT3, ZZ5-ZZ7 2 XT3 1 XT3
4 XY1, XT3, ZZ5-ZZ7 3 ZZ5-ZZ7 1 ZZ5
4 XY1, XT3, ZZ5-ZZ7 3 ZZ5-ZZ7 2 ZZ6
4 XY1, XT3, ZZ5-ZZ7 3 ZZ5-ZZ7 3 ZZ7
Upvotes: 2
Reputation: 6721
Or:
CROSS JOIN
your input with a series of consecutive integers, and pick the i-th occurrence of consecutive non-commas / consecutive non-hyphens....
WITH
input(sid,str) AS (
SELECT 1,'A1,A2,A4' FROM dual
UNION ALL SELECT 2,'ANY-BY-HYPHEN' FROM dual
)
-- a set of running integer variables
,
i(i) AS (
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
)
SELECT
sid
, i
, REGEXP_SUBSTR(str,'[^-,]+',1,i) AS part
FROM input CROSS JOIN i
WHERE REGEXP_SUBSTR(str,'[^-,]+',1,i) <>''
ORDER BY sid,i
;
-- out sid | i | part
-- out -----+---+--------
-- out 1 | 1 | A1
-- out 1 | 2 | A2
-- out 1 | 3 | A4
-- out 2 | 1 | ANY
-- out 2 | 2 | BY
-- out 2 | 3 | HYPHEN
Upvotes: 0
Reputation: 191265
Assuming the pattern will always be a pair of values with the same prefix ('A' here) and each followed by a number, you could use a different regular expression to extract the prefix, starting number and ending number:
SELECT
regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 1) as prefix,
to_number(regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 2)) as start_num,
to_number(regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 3)) as end_num
FROM dual
PREFIX START_NUM END_NUM
------ --------- ---------
A 1 4
and then use that in a recursive CTE to get the values in between:
WITH rcte (prefix, num, end_num) AS (
SELECT
regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 1),
to_number(regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 2)),
to_number(regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 3))
FROM dual
UNION ALL
SELECT prefix, num + 1, end_num
FROM rcte
WHERE num < end_num
)
SELECT prefix || num as result
FROM rcte
RESULT
------
A1
A2
A3
A4
You could combine both approaches in one query, further assuming you don't have mixes of comma-separated values and ranges in the same string; db<>fiddle demo. If you do have a mix you could apply them in series; convert comma-separated to rows, then further process any of those new rows that are actually hyphen ranges.
Upvotes: 2
Reputation: 142705
If applied to a table with several rows, then you might try something like this (see comments within code):
SQL> with test (id, col) as
2 -- sample data
3 (select 1, 'A1,A2,A4' from dual union all
4 select 2, 'BX8-BX11' from dual union all
5 select 3, 'C1,C4' from dual union all
6 select 4, 'D6-D9' from dual
7 ),
8 temp as
9 -- split e.g. "BX8-BX11" to "BX", 8 and 11
10 (select id,
11 regexp_substr(col, '^[[:alpha:]]+') alp,
12 to_number(regexp_substr(col, '\d+', 1, 1)) num1,
13 to_number(regexp_substr(col, '\d+', 1, 2)) num2
14 from test
15 where instr(col, '-') > 0
16 )
17 -- trivial - split comma-separated values to rows
18 select id,
19 regexp_substr(col, '[^,]+', 1, column_value) val
20 from test cross join table(cast(multiset(select level from dual
21 connect by level <= regexp_count(col, ',') + 1
22 ) as sys.odcinumberlist))
23 where instr(col, '-') = 0
24 union all
25 -- create rows for values that are dash-separated
26 select id,
27 alp || to_char(num1 + column_value - 1) val
28 from temp cross join table(cast(multiset(select level from dual
29 connect by level <= num2 - num1 + 1
30 ) as sys.odcinumberlist))
31 order by id, val;
ID VAL
---------- ------------------------------------------------
1 A1
1 A2
1 A4
2 BX10
2 BX11
2 BX8
2 BX9
3 C1
3 C4
4 D6
4 D7
4 D8
4 D9
13 rows selected.
SQL>
Upvotes: 0