GwaiTsi
GwaiTsi

Reputation: 1

SQL Oracle how to split text into rows and retain the deliminator but not duplicates

I have a column value e.g. 'IC123456IC789123'. The value is an aggregation from different systems, so could be in the form of numbers or letters/numbers with no fixed format e.g. 921253 X4525186 IC543534

Before the numbers can be used in another join, the data needs to be cleaned. The is a consistent pattern with one particular system that seems to combined two numbers e.g. the example with the IC prefix

I need to create two rows for this column, but only in this circumstance. the other data should be passed through and they should be distinct values (but in some cases, per the example, they are duplicated in a column)

results
'IC123456'
'IC789123'

But it should only create one row, if it is a duplicate

'IC123456IC123456'
'IC123456'
'IC123456'
results 
'IC123456'

I tried REGEXP_SUBSTR('account','number','[^IC]',1,2) AS "PARTS"

but only gives me the numbers without the 'IC' (which i need to retain). Also, gives them on a new column, but i want to retain in the 'number column' it should also return every other number which does not contain 'IC' in it.

Upvotes: 0

Views: 56

Answers (2)

MT0
MT0

Reputation: 168623

Assuming that your values are all 8-characters long (IC then 6 digits) then you can use simple string functions (which are much faster than regular expressions) and a recursive sub-query factoring clause:

WITH rsqfc (id, value, idx) AS (
  SELECT id, value, 1 FROM table_name
UNION ALL
  SELECT id, value, idx + 1 FROM rsqfc WHERE idx + 1 <= LENGTH(value)/8
)
SELECT DISTINCT
       id,
       SUBSTR(value, 8*idx - 7, 8) AS value
FROM   rsqfc

If you have variable length terms (all starting with IC) then you can use:

WITH rsqfc (id, value, spos, epos) AS (
  SELECT id,
         value,
         INSTR(value, 'IC', 1, 1),
         INSTR(value, 'IC', 1, 2)
  FROM   table_name
UNION ALL
  SELECT id,
         value,
         epos,
         INSTR(value, 'IC', epos + 2, 1)
  FROM   rsqfc
  WHERE  epos > 0
)
SELECT DISTINCT
       id,
       CASE epos
       WHEN 0
       THEN SUBSTR(value, spos)
       ELSE SUBSTR(value, spos, epos - spos)
       END AS value
FROM   rsqfc

If you have variable length terms that take the form letters-then-digits then you can use regular expressions:

WITH rsqfc (id, value, term, epos) AS (
  SELECT id,
         value,
         REGEXP_SUBSTR(value, '[A-Z]+\d+', 1, 1),
         REGEXP_INSTR(value, '[A-Z]+\d+', 1, 1, 1)
  FROM   table_name
UNION ALL
  SELECT id,
         value,
         REGEXP_SUBSTR(value, '[A-Z]+\d+', epos, 1),
         REGEXP_INSTR(value, '[A-Z]+\d+', epos, 1, 1)
  FROM   rsqfc
  WHERE  epos < LENGTH(value)
)
SELECT DISTINCT
       id,
       term AS value
FROM   rsqfc

Or, using simple string functions (which may be more to type but may be faster than regular expressions):

WITH patterns (id, value, pattern) AS (
  SELECT id,
         value,
         TRANSLATE(
           value,
           'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
           'XXXXXXXXXXXXXXXXXXXXXXXXXX0000000000'
         )
  FROM   table_name
),
rsqfc (id, value, pattern, spos, epos) AS (
  SELECT id,
         value,
         pattern,
         INSTR(pattern, 'X'),
         INSTR(pattern, 'X', INSTR(pattern, '0'))
  FROM   patterns
UNION ALL
  SELECT id,
         value,
         pattern,
         epos,
         INSTR(pattern, 'X', INSTR(pattern, '0', epos))
  FROM   rsqfc
  WHERE  epos > 0
)
SELECT DISTINCT
       id,
       CASE epos
       WHEN 0
       THEN SUBSTR(value, spos)
       ELSE SUBSTR(value, spos, epos - spos)
       END AS value
FROM   rsqfc

Which, for the sample data:

CREATE TABLE table_name (id, value) AS
SELECT 1, 'IC123456IC789123' FROM DUAL UNION ALL
SELECT 2, 'IC123456IC123456' FROM DUAL;

All output:

ID VALUE
1 IC123456
2 IC123456
1 IC789123

Note: id = 2 is not duplicated.

fiddle

Upvotes: 1

Ankit Bajpai
Ankit Bajpai

Reputation: 13517

You can split the string in 2 rows by IC keyword using Connect by clause and then use the distinct keyword to get the unique value -

select distinct 'IC' || trim(regexp_substr(value,'[^IC]+', 1, level) ) value
  from your_table
   connect by regexp_substr(value, '[^IC]+', 1, level) is not null;

Demo.

Upvotes: 1

Related Questions