Reputation: 1
I have a column value e.g. 'IC123456IC789123'. The value is an aggregation from different systems, so could be in the form of numbers or letters/numbers with no fixed format e.g. 921253 X4525186 IC543534
Before the numbers can be used in another join, the data needs to be cleaned. The is a consistent pattern with one particular system that seems to combined two numbers e.g. the example with the IC prefix
I need to create two rows for this column, but only in this circumstance. the other data should be passed through and they should be distinct values (but in some cases, per the example, they are duplicated in a column)
results
'IC123456'
'IC789123'
But it should only create one row, if it is a duplicate
'IC123456IC123456'
'IC123456'
'IC123456'
results
'IC123456'
I tried REGEXP_SUBSTR('account','number','[^IC]',1,2) AS "PARTS"
but only gives me the numbers without the 'IC' (which i need to retain). Also, gives them on a new column, but i want to retain in the 'number column' it should also return every other number which does not contain 'IC' in it.
Upvotes: 0
Views: 56
Reputation: 168623
Assuming that your values are all 8-characters long (IC
then 6 digits) then you can use simple string functions (which are much faster than regular expressions) and a recursive sub-query factoring clause:
WITH rsqfc (id, value, idx) AS (
SELECT id, value, 1 FROM table_name
UNION ALL
SELECT id, value, idx + 1 FROM rsqfc WHERE idx + 1 <= LENGTH(value)/8
)
SELECT DISTINCT
id,
SUBSTR(value, 8*idx - 7, 8) AS value
FROM rsqfc
If you have variable length terms (all starting with IC
) then you can use:
WITH rsqfc (id, value, spos, epos) AS (
SELECT id,
value,
INSTR(value, 'IC', 1, 1),
INSTR(value, 'IC', 1, 2)
FROM table_name
UNION ALL
SELECT id,
value,
epos,
INSTR(value, 'IC', epos + 2, 1)
FROM rsqfc
WHERE epos > 0
)
SELECT DISTINCT
id,
CASE epos
WHEN 0
THEN SUBSTR(value, spos)
ELSE SUBSTR(value, spos, epos - spos)
END AS value
FROM rsqfc
If you have variable length terms that take the form letters-then-digits then you can use regular expressions:
WITH rsqfc (id, value, term, epos) AS (
SELECT id,
value,
REGEXP_SUBSTR(value, '[A-Z]+\d+', 1, 1),
REGEXP_INSTR(value, '[A-Z]+\d+', 1, 1, 1)
FROM table_name
UNION ALL
SELECT id,
value,
REGEXP_SUBSTR(value, '[A-Z]+\d+', epos, 1),
REGEXP_INSTR(value, '[A-Z]+\d+', epos, 1, 1)
FROM rsqfc
WHERE epos < LENGTH(value)
)
SELECT DISTINCT
id,
term AS value
FROM rsqfc
Or, using simple string functions (which may be more to type but may be faster than regular expressions):
WITH patterns (id, value, pattern) AS (
SELECT id,
value,
TRANSLATE(
value,
'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
'XXXXXXXXXXXXXXXXXXXXXXXXXX0000000000'
)
FROM table_name
),
rsqfc (id, value, pattern, spos, epos) AS (
SELECT id,
value,
pattern,
INSTR(pattern, 'X'),
INSTR(pattern, 'X', INSTR(pattern, '0'))
FROM patterns
UNION ALL
SELECT id,
value,
pattern,
epos,
INSTR(pattern, 'X', INSTR(pattern, '0', epos))
FROM rsqfc
WHERE epos > 0
)
SELECT DISTINCT
id,
CASE epos
WHEN 0
THEN SUBSTR(value, spos)
ELSE SUBSTR(value, spos, epos - spos)
END AS value
FROM rsqfc
Which, for the sample data:
CREATE TABLE table_name (id, value) AS
SELECT 1, 'IC123456IC789123' FROM DUAL UNION ALL
SELECT 2, 'IC123456IC123456' FROM DUAL;
All output:
ID | VALUE |
---|---|
1 | IC123456 |
2 | IC123456 |
1 | IC789123 |
Note: id = 2
is not duplicated.
Upvotes: 1
Reputation: 13517
You can split the string in 2 rows by IC keyword using Connect by clause and then use the distinct keyword to get the unique value -
select distinct 'IC' || trim(regexp_substr(value,'[^IC]+', 1, level) ) value
from your_table
connect by regexp_substr(value, '[^IC]+', 1, level) is not null;
Upvotes: 1