Devaraj Mahesan
Devaraj Mahesan

Reputation: 478

Filter the rows with number only data in a column SQL

I am trying to SELECT rows in a table, by applying a filter condition of identifying number only columns. It is a report only query, so we least bother the performance, as we dont have the privilege to compile a PL/SQL am unable to check by TO_NUMBER() and return if it is numeric or not.

I have to achieve it in SQL. Also the column is having the values like this, which have to be treated as Numbers.

-1.0
-0.1
-.1
+1,2034.89
+00000
1023

After ground breaking research, I wrote this.(Hard time)

  WITH dummy_data AS
  ( SELECT '-1.0' AS txt FROM dual
  UNION ALL
  SELECT '+0.1' FROM dual
  UNION ALL
  SELECT '-.1' FROM dual
  UNION ALL
  SELECT '+1,2034.89.00' FROM dual
  UNION ALL
  SELECT '+1,2034.89' FROM dual
  UNION ALL
  SELECT 'Deva +21' FROM dual
  UNION ALL
  SELECT '1+1' FROM dual
  UNION ALL
  SELECT '1023' FROM dual
  )
SELECT dummy_data.*,
  REGEXP_COUNT(txt,'.')
FROM dummy_data
WHERE REGEXP_LIKE (TRANSLATE(TRIM(txt),'+,-.','0000'),'^[-+]*[[:digit:]]');

I got this.

TXT           REGEXP_COUNT(TXT,'.')
------------- ---------------------
-1.0                              4
+0.1                              4
-.1                               3
+1,2034.89.00                    13 /* Should not be returned */
+1,2034.89                       10
1+1                               3 /* Should not be returned */
1023                              4

7 rows selected.

Now terribly confused with 2 Questions.

1) I get +1,2034.89.00 too in result, I should eliminate it. (means, two decimal points) Not just decimal point, double in every other special character (-+,) should be eliminated)

2) To make it uglier, planned to do a REGEXP_COUNT('.') <= 1. But it is not returning my expectation, while selecting it, I see strange values returned.

Can someone help me to frame the REGEXP for the avoiding the double occurences of ('.','+','-')

Upvotes: 2

Views: 1760

Answers (3)

Thorsten Kettner
Thorsten Kettner

Reputation: 95101

First you remove plus and minus with translate and then you wonder why their position is not considered? :-)

This should work:

WITH dummy_data AS
  ( SELECT '-1.0' AS txt FROM dual
  UNION ALL
  SELECT '+0.1' FROM dual
  UNION ALL
  SELECT '-.1' FROM dual
  UNION ALL
  SELECT '+12034.89.00' FROM dual -- invalid: duplicate decimal separator 
  UNION ALL
  SELECT '+1,2034.89' FROM dual -- invalid: thousand separator placement
  UNION ALL
  SELECT 'Deva +21' FROM dual -- invalid: letters
  UNION ALL
  SELECT '1+1' FROM dual -- invalid: plus sign placement
  UNION ALL
  SELECT '1023' FROM dual
  UNION ALL
  SELECT '1.023,88' FROM dual -- invalid: decimal/thousand separators mixed up
  UNION ALL
  SELECT '1,234' FROM dual
  UNION ALL
  SELECT '+1,234.56' FROM dual
  UNION ALL
  SELECT '-123' FROM dual
  UNION ALL
  SELECT '+123,0000' FROM dual -- invalid: thousand separator placement
  UNION ALL
  SELECT '+234.' FROM dual -- invalid: decimal separator not followed by digits
  UNION ALL
  SELECT '12345,678' FROM dual -- invalid: missing thousand separator
  UNION ALL
  SELECT '+' FROM dual -- invalid: digits missing
  UNION ALL
  SELECT '.' FROM dual -- invalid: digits missing
  )
select * from dummy_data
where regexp_like(txt, '[[:digit:]]') and
(
  regexp_like(txt, '^[-+]{0,1}([[:digit:]]){0,3}(\,([[:digit:]]){0,3})*(\.[[:digit:]]+){0,1}$')
  or 
  regexp_like(txt, '^[-+]{0,1}[[:digit:]]*(\.[[:digit:]]+){0,1}$')
);

You see, you need three regular expressions; one to guarantee that there is at least one digit in the string, one for numbers with thousand separators, and one for numbers without.

With thousand separators: txt may start with one plus or minus sign, then there may be up to three digits. These may be followed by a thousand separator plus three digits several times. Then there may be a decimal separator with at least one following number.

Without thousand separators: txt may start with one plus or minus sign, then there may be digits. Then there may be a decimal separator with at least one following number.

I hope I haven't overlooked anything.

Upvotes: 1

Maheswaran Ravisankar
Maheswaran Ravisankar

Reputation: 17920

I just tried to correct the mistakes of you and made the SQL simple as possible. But not neat!

WITH dummy_data AS
  ( SELECT '-1.0' AS txt FROM dual
  UNION ALL
  SELECT '+.0' FROM dual
  UNION ALL
  SELECT '-.1' FROM dual
  UNION ALL
  SELECT '+1,2034.89.0' FROM dual
  UNION ALL
  SELECT '+1,2034.89' FROM dual
  UNION ALL
  SELECT 'Deva +21' FROM dual
  UNION ALL
  SELECT 'DeVA 234 Deva' FROM dual
  UNION ALL
  SELECT '1023' FROM dual
  )
SELECT to_number(REPLACE(txt,',')),
  REGEXP_COUNT(txt,'.')
FROM dummy_data
WHERE REGEXP_LIKE (txt,'^[-+]*')
AND NOT REGEXP_LIKE (TRANSLATE(txt,'+,-.','0000'),'[^[:digit:]]')
AND REGEXP_COUNT(txt,',')  <= 1
AND REGEXP_COUNT(txt,'\+') <= 1
AND REGEXP_COUNT(txt,'\-') <= 1
AND REGEXP_COUNT(txt,'\.') <= 1;

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1271241

The following expression works for everything, except the commas:

'^[-+]*[0-9,]*[.]*[0-9]+$'

You can check for bad comma placement with additional checks like:

not regexp_like(txt, '[-+]*,$') and not regexp_like(txt, [',,'])

Upvotes: 1

Related Questions