Muhammad Asim
Muhammad Asim

Reputation: 159

Split hyphen separated string into rows in Oracle

I have a column in table that stores data in series. Some of the data is separated using hyphen and some is separated using comma. I want to split the data into rows. The thing is that comma separated values are treated as single values after each comma but for hyphen it means a sort of data range. For example If the string is something like this 'A1, A2, A4', it means there are 3 values and will be transformed into 3 rows. There are also string like 'A1-A4' which means there are 4 values and will be transformed into 4 rows as hyphen represents range of values stating starting and ending value.

I am able to transform comma separated values but not sure how to split hyphen separated range in oracle.

  SELECT regexp_substr('A1,A2,A4' , '[^,]+', 1, level) as a
  FROM dual
  CONNECT BY regexp_substr('A1,A2,A4', '[^,]+', 1, level) is not null

Above ddl converts the provided string into 3 rows which is fine.

  SELECT regexp_substr('A1-A4' , '[^-]+', 1, level) as a
  FROM dual
  CONNECT BY regexp_substr('A1-A4', '[^-]+', 1, level) is not null

But this above query should return 4 rows but I am not sure how to achieve this. Any thoughts ?

Upvotes: 0

Views: 1113

Answers (4)

Sayan Malakshinov
Sayan Malakshinov

Reputation: 8655

Full example with extended sample data:

with t(n, str) as (
select 1,'A1, A2, A4' from dual union all
select 2,'B1, B4, B7-B11' from dual union all
select 3,'C1, C3, C5-C7' from dual union all
select 4,'XY1, XT3, ZZ5-ZZ7' from dual 
)
select *
from t
    ,lateral(
        select level part_n, regexp_substr(str,'[^ ,]+',1,level) part
        from dual 
        connect by level<=regexp_count(str,'[^ ,]+')
     )
    ,lateral(
        select 
           level sub_part_n, 
           nvl(
              regexp_substr(part,'(\w+)(\d+)[ -]+\1(\d+)',1,1,null,1)
              ||
              (regexp_substr(part,'(\w+)(\d+)[ -]+\1(\d+)',1,1,null,2) + level -1) 
             ,part
             )
             as subpart
        from dual 
        connect by level<= regexp_substr(part,'(\w+)(\d+)[ -]+\1(\d+)',1,1,null,3)
                         - regexp_substr(part,'(\w+)(\d+)[ -]+\1(\d+)',1,1,null,2)
                         + 1
    )

Results:

         N STR                   PART_N PART       SUB_PART_N SUBPART
---------- ----------------- ---------- ---------- ---------- ----------
         1 A1, A2, A4                 1 A1                  1 A1
         1 A1, A2, A4                 2 A2                  1 A2
         1 A1, A2, A4                 3 A4                  1 A4
         2 B1, B4, B7-B11             1 B1                  1 B1
         2 B1, B4, B7-B11             2 B4                  1 B4
         2 B1, B4, B7-B11             3 B7-B11              1 B7
         2 B1, B4, B7-B11             3 B7-B11              2 B8
         2 B1, B4, B7-B11             3 B7-B11              3 B9
         2 B1, B4, B7-B11             3 B7-B11              4 B10
         2 B1, B4, B7-B11             3 B7-B11              5 B11
         3 C1, C3, C5-C7              1 C1                  1 C1
         3 C1, C3, C5-C7              2 C3                  1 C3
         3 C1, C3, C5-C7              3 C5-C7               1 C5
         3 C1, C3, C5-C7              3 C5-C7               2 C6
         3 C1, C3, C5-C7              3 C5-C7               3 C7
         4 XY1, XT3, ZZ5-ZZ7          1 XY1                 1 XY1
         4 XY1, XT3, ZZ5-ZZ7          2 XT3                 1 XT3
         4 XY1, XT3, ZZ5-ZZ7          3 ZZ5-ZZ7             1 ZZ5
         4 XY1, XT3, ZZ5-ZZ7          3 ZZ5-ZZ7             2 ZZ6
         4 XY1, XT3, ZZ5-ZZ7          3 ZZ5-ZZ7             3 ZZ7

Upvotes: 2

marcothesane
marcothesane

Reputation: 6721

Or:

CROSS JOIN your input with a series of consecutive integers, and pick the i-th occurrence of consecutive non-commas / consecutive non-hyphens....

WITH
input(sid,str) AS (
          SELECT 1,'A1,A2,A4' FROM dual
UNION ALL SELECT 2,'ANY-BY-HYPHEN' FROM dual
)
-- a set of running integer variables
,
i(i) AS (
          SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
)
SELECT
  sid
, i
, REGEXP_SUBSTR(str,'[^-,]+',1,i) AS part
FROM input CROSS JOIN i
WHERE REGEXP_SUBSTR(str,'[^-,]+',1,i) <>''
ORDER BY sid,i
;
-- out  sid | i |  part  
-- out -----+---+--------
-- out    1 | 1 | A1
-- out    1 | 2 | A2
-- out    1 | 3 | A4
-- out    2 | 1 | ANY
-- out    2 | 2 | BY
-- out    2 | 3 | HYPHEN                      

Upvotes: 0

Alex Poole
Alex Poole

Reputation: 191265

Assuming the pattern will always be a pair of values with the same prefix ('A' here) and each followed by a number, you could use a different regular expression to extract the prefix, starting number and ending number:

SELECT
  regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 1) as prefix,
  to_number(regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 2)) as start_num,
  to_number(regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 3)) as end_num
FROM dual

PREFIX  START_NUM    END_NUM
------  ---------  ---------
A               1          4

and then use that in a recursive CTE to get the values in between:

WITH rcte (prefix, num, end_num) AS (
  SELECT
    regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 1),
    to_number(regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 2)),
    to_number(regexp_substr('A1-A4' , '(.*?)(\d+)-.*?(\d+)', 1, 1, null, 3))
  FROM dual
  UNION ALL
  SELECT prefix, num + 1, end_num
  FROM rcte
  WHERE num < end_num
)
SELECT prefix || num as result
FROM rcte

RESULT
------
A1
A2
A3
A4

db<>fiddle

You could combine both approaches in one query, further assuming you don't have mixes of comma-separated values and ranges in the same string; db<>fiddle demo. If you do have a mix you could apply them in series; convert comma-separated to rows, then further process any of those new rows that are actually hyphen ranges.

Upvotes: 2

Littlefoot
Littlefoot

Reputation: 142705

If applied to a table with several rows, then you might try something like this (see comments within code):

SQL> with test (id, col) as
  2  -- sample data
  3    (select 1, 'A1,A2,A4' from dual union all
  4     select 2, 'BX8-BX11' from dual union all
  5     select 3, 'C1,C4'    from dual union all
  6     select 4, 'D6-D9'    from dual
  7    ),
  8  temp as
  9  -- split e.g. "BX8-BX11" to "BX", 8 and 11
 10    (select id,
 11            regexp_substr(col, '^[[:alpha:]]+') alp,
 12            to_number(regexp_substr(col, '\d+', 1, 1)) num1,
 13            to_number(regexp_substr(col, '\d+', 1, 2)) num2
 14     from test
 15     where instr(col, '-') > 0
 16    )
 17  -- trivial - split comma-separated values to rows
 18  select id,
 19         regexp_substr(col, '[^,]+', 1, column_value) val
 20  from test cross join table(cast(multiset(select level from dual
 21                                           connect by level <= regexp_count(col, ',') + 1
 22                                          ) as sys.odcinumberlist))
 23  where instr(col, '-') = 0
 24  union all
 25  -- create rows for values that are dash-separated
 26  select id,
 27         alp || to_char(num1 + column_value - 1) val
 28  from temp cross join table(cast(multiset(select level from dual
 29                                           connect by level <= num2 - num1 + 1
 30                                          ) as sys.odcinumberlist))
 31  order by id, val;

        ID VAL
---------- ------------------------------------------------
         1 A1
         1 A2
         1 A4
         2 BX10
         2 BX11
         2 BX8
         2 BX9
         3 C1
         3 C4
         4 D6
         4 D7
         4 D8
         4 D9

13 rows selected.

SQL>

Upvotes: 0

Related Questions