Reputation: 478
I am trying to SELECT
rows in a table, by applying a filter condition of identifying number only columns. It is a report only query, so we least bother the performance, as we dont have the privilege to compile a PL/SQL am unable to check by TO_NUMBER()
and return if it is numeric or not.
I have to achieve it in SQL. Also the column is having the values like this, which have to be treated as Numbers.
-1.0
-0.1
-.1
+1,2034.89
+00000
1023
After ground breaking research, I wrote this.(Hard time)
WITH dummy_data AS
( SELECT '-1.0' AS txt FROM dual
UNION ALL
SELECT '+0.1' FROM dual
UNION ALL
SELECT '-.1' FROM dual
UNION ALL
SELECT '+1,2034.89.00' FROM dual
UNION ALL
SELECT '+1,2034.89' FROM dual
UNION ALL
SELECT 'Deva +21' FROM dual
UNION ALL
SELECT '1+1' FROM dual
UNION ALL
SELECT '1023' FROM dual
)
SELECT dummy_data.*,
REGEXP_COUNT(txt,'.')
FROM dummy_data
WHERE REGEXP_LIKE (TRANSLATE(TRIM(txt),'+,-.','0000'),'^[-+]*[[:digit:]]');
I got this.
TXT REGEXP_COUNT(TXT,'.')
------------- ---------------------
-1.0 4
+0.1 4
-.1 3
+1,2034.89.00 13 /* Should not be returned */
+1,2034.89 10
1+1 3 /* Should not be returned */
1023 4
7 rows selected.
Now terribly confused with 2 Questions.
1) I get +1,2034.89.00
too in result, I should eliminate it. (means, two decimal points) Not just decimal point, double in every other special character (-+,) should be eliminated)
2) To make it uglier, planned to do a REGEXP_COUNT('.') <= 1
. But it is not returning my expectation, while selecting it, I see strange values returned.
Can someone help me to frame the REGEXP
for the avoiding the double occurences of ('.','+','-')
Upvotes: 2
Views: 1760
Reputation: 95101
First you remove plus and minus with translate and then you wonder why their position is not considered? :-)
This should work:
WITH dummy_data AS
( SELECT '-1.0' AS txt FROM dual
UNION ALL
SELECT '+0.1' FROM dual
UNION ALL
SELECT '-.1' FROM dual
UNION ALL
SELECT '+12034.89.00' FROM dual -- invalid: duplicate decimal separator
UNION ALL
SELECT '+1,2034.89' FROM dual -- invalid: thousand separator placement
UNION ALL
SELECT 'Deva +21' FROM dual -- invalid: letters
UNION ALL
SELECT '1+1' FROM dual -- invalid: plus sign placement
UNION ALL
SELECT '1023' FROM dual
UNION ALL
SELECT '1.023,88' FROM dual -- invalid: decimal/thousand separators mixed up
UNION ALL
SELECT '1,234' FROM dual
UNION ALL
SELECT '+1,234.56' FROM dual
UNION ALL
SELECT '-123' FROM dual
UNION ALL
SELECT '+123,0000' FROM dual -- invalid: thousand separator placement
UNION ALL
SELECT '+234.' FROM dual -- invalid: decimal separator not followed by digits
UNION ALL
SELECT '12345,678' FROM dual -- invalid: missing thousand separator
UNION ALL
SELECT '+' FROM dual -- invalid: digits missing
UNION ALL
SELECT '.' FROM dual -- invalid: digits missing
)
select * from dummy_data
where regexp_like(txt, '[[:digit:]]') and
(
regexp_like(txt, '^[-+]{0,1}([[:digit:]]){0,3}(\,([[:digit:]]){0,3})*(\.[[:digit:]]+){0,1}$')
or
regexp_like(txt, '^[-+]{0,1}[[:digit:]]*(\.[[:digit:]]+){0,1}$')
);
You see, you need three regular expressions; one to guarantee that there is at least one digit in the string, one for numbers with thousand separators, and one for numbers without.
With thousand separators: txt may start with one plus or minus sign, then there may be up to three digits. These may be followed by a thousand separator plus three digits several times. Then there may be a decimal separator with at least one following number.
Without thousand separators: txt may start with one plus or minus sign, then there may be digits. Then there may be a decimal separator with at least one following number.
I hope I haven't overlooked anything.
Upvotes: 1
Reputation: 17920
I just tried to correct the mistakes of you and made the SQL simple as possible. But not neat!
WITH dummy_data AS
( SELECT '-1.0' AS txt FROM dual
UNION ALL
SELECT '+.0' FROM dual
UNION ALL
SELECT '-.1' FROM dual
UNION ALL
SELECT '+1,2034.89.0' FROM dual
UNION ALL
SELECT '+1,2034.89' FROM dual
UNION ALL
SELECT 'Deva +21' FROM dual
UNION ALL
SELECT 'DeVA 234 Deva' FROM dual
UNION ALL
SELECT '1023' FROM dual
)
SELECT to_number(REPLACE(txt,',')),
REGEXP_COUNT(txt,'.')
FROM dummy_data
WHERE REGEXP_LIKE (txt,'^[-+]*')
AND NOT REGEXP_LIKE (TRANSLATE(txt,'+,-.','0000'),'[^[:digit:]]')
AND REGEXP_COUNT(txt,',') <= 1
AND REGEXP_COUNT(txt,'\+') <= 1
AND REGEXP_COUNT(txt,'\-') <= 1
AND REGEXP_COUNT(txt,'\.') <= 1;
Upvotes: 0
Reputation: 1271241
The following expression works for everything, except the commas:
'^[-+]*[0-9,]*[.]*[0-9]+$'
You can check for bad comma placement with additional checks like:
not regexp_like(txt, '[-+]*,$') and not regexp_like(txt, [',,'])
Upvotes: 1