On_demand
On_demand

Reputation: 49

How to use regexp to clean data entry errors

I am working with numbers that are entered incorrectly in the database and I am trying to remove the data error. Some number values are entered with alphabetical values such as "27.99e". These mistakes I can easily resolve with regexp_replace. However, there are also values entered like this: "14..89" when it should have been entered like this "14.89". I tried to use the reg_exp function to fix it but it is not working, here is what I tried so far:

SELECT REGEXP_REPLACE('14..89', '[^.0-9]+', '') from dual;

SELECT regexp_replace(regexp_substr('14..89', '.*(\..*)$', 1, 1, NULL, 1), '[^.0-9]+', '') FROM dual;

The second one seemed to remove the double period issue, however, it also removed the first half of the number

If someone could please help me, that would be great. Thank you

Upvotes: 0

Views: 230

Answers (3)

MT0
MT0

Reputation: 168361

You can use:

SELECT value,
       REGEXP_REPLACE(
         value,
         '^([+-])|([eE][+-]?\d+$)|(\.)\.+|[^0-9.]',
         '\1\2\3'
       ) AS cleaned
FROM   table_name

Which, for the sample data:

CREATE TABLE table_name ( value ) AS
SELECT '123..45' FROM DUAL UNION ALL
SELECT '678e' FROM DUAL UNION ALL
SELECT '+9' FROM DUAL UNION ALL
SELECT '-10' FROM DUAL UNION ALL
SELECT '11-' FROM DUAL UNION ALL
SELECT '12..3+' FROM DUAL UNION ALL
SELECT '56e-78' FROM DUAL;

Outputs:

VALUE CLEANED
123..45 123.45
678e 678
+9 +9
-10 -10
11- 11
12..3+ 12.3
56e-78 56e-78

db<>fiddle here


When you have fixed the errors; consider converting the column to a NUMBER column so that you cannot enter free text values and get back into this situation.

Upvotes: 1

Littlefoot
Littlefoot

Reputation: 143013

Why wouldn't you simply replace two consecutive dots with a single one?

SQL> select replace('14..89', '..', '.') result from dual;

RESULT
----------
14.89

SQL>

You don't need regular expressions for that.

Upvotes: 0

Sayan Malakshinov
Sayan Malakshinov

Reputation: 8655

If you want to replace more than one dot with exactly one dot:

SELECT regexp_replace('14..89', '(\.){2,}', '.') FROM dual;

But it doesn't check if you have '1.2.3.4'...

Btw, re 'e' char: are you sure that it wasn't a scientific notation like:

SQL> select 28.39e0 as n1, 28.39e4 as n2, 28.39e-3 as n3 from dual;

        N1         N2         N3
---------- ---------- ----------
     28.39     283900     .02839

Upvotes: 0

Related Questions