Reputation: 57
User @psaraj12 helped me with a ticket here about finding ascii character in a string in my DB with the following code:
with test (col) as (
select
'L.A.D'
from
dual
union all
select
'L..D.'
from
dual
)
select
col,
case when max(ascii_of_one_character) >= 65535 then 'NOT OK' else 'OK' end result
from
(
select
col,
substr(col, column_value, 1) one_character,
ascii(
substr(col, column_value, 1)
) ascii_of_one_character
from
test cross
join table(
cast(
multiset(
select
level
from
dual connect by level <= length(col)
) as sys.odcinumberlist
)
)
)
group by
col
having max(ascii_of_one_character) >= 4000000000;
The script looks for characters of a certain range GROUP
s them and marks displays them.
Is it possible to include this in a REPLACE statement of a similar sort:
REPLACE(table.column, max(ascii_of_one_character) >= 4000000000, '')
EDIT: As per @flyaround answer this is the code I use changed a little bit:
with test (col) as (
select skunden.name1
from skunden
)
select col
, REGEXP_REPLACE(col, 'max(ascii_of_one_character)>=4000000000', '') as cleaned
, CASE WHEN REGEXP_COUNT(col, 'max(ascii_of_one_character)>=4000000000') > 0 THEN 0 ELSE 1 END as isOk
from test;
Upvotes: 0
Views: 142
Reputation: 363
Coming back to your original code, because my suggested REGEX_REPLACE is not working sufficient with high surrogates. Your approach is already very effective, so I jumped into it to have a solution here.
MERGE
INTO skunden
USING (
select
id as innerId,
name as innerName,
case when max(ascii_of_one_character) >= 65535 then 0 else 1 end isOk,
listagg(case when ascii_of_one_character <65535 then one_character end , '') within group (order by rn) as cleaned
from
(
select
id,
name,
substr(name, column_value, 1) one_character,
ascii(
substr(name, column_value, 1)
) ascii_of_one_character
, rownum as rn
from
skunden cross
join table(
cast(
multiset(
select
level
from
dual connect by level <= length(name)
) as sys.odcinumberlist
)
)
)
group by
id, name
having max(ascii_of_one_character) >= 4000000000
)
ON (skunden.id = innerId)
WHEN MATCHED THEN
UPDATE
SET name = cleaned
;
On MERGE you can't use the referencing column for an update. Therefore you should use the unique key (I used 'id' in my example) of your table.
The resulting value will be 'L..D' for your example value of 'L..D.'
Upvotes: 1
Reputation: 363
If I got your question correctly you would like to remove characters with a higher decimal representation of characters than specified. You could check to use REGEXP_REPLACE for this, like:
with test (col) as (
select
'L.A.D'
from
dual
union all
select
'L..D.'
from
dual
)
select col
, REGEXP_REPLACE(col, '[^\u00010000-\u0010FFFF]+$', '') as cleaned
, CASE WHEN REGEXP_COUNT(col, '[^\u00010000-\u0010FFFF]+$') > 0 THEN 0 ELSE 1 END as isOk
from test;
Upvotes: 1