Reputation: 17481
I have a column of type text, that contains random values. Some of them are numbers, some other texts, and some other mixed text and numbers. I'm trying to discriminate them as:
myfield is_numeric
____________________
-4 true
0004 true
4.00 true
dog false
D04 false
04f false
Whereas others are names and other strings. I was using the regular expression
SELECT id,
myfield
(myfield::varchar~ '^-?[0-9]*.?[0-9]*$') is_numeric
FROM mytable
To tell wether the row contains a valid number or not. However, I noticed that values like D04
and 04c
return true
for that regex, which, for my use case, is a false positive.
Why is this happening? It seems that ^
is matching not necessarily the whole value, but any valid substring of the value. However, values like D04f
do return false
, so even if there's a numeric substring in the field, the combination of ^
and $
operators is doing its job.
I have temporarily resorted to using:
SELECT id,
myfield
(myfield::varchar ~ '^-?[0-9]*.?[0-9]*$'
AND myfield::varchar !~ '[^0-9\-\.]') is_numeric
FROM mytable
But this seem inneficient (and does not rule out double dots), and I'm still wondering why the regex is correctly ruling out strings that start and end with a non-numeric character, while incorrecly returning true for string that contain only a trailing or leading non numeric character.
Upvotes: 9
Views: 25996
Reputation: 26133
Any solution based on regular expressions will at best reinvent PostgreSQL's own logic used for casting text
to numeric
(or any other type).
In order to stay consistent with PostgreSQL's own definition of the type, it's easiest to let the db use it to run the validation. In PostgreSQL 16 and above, there's pg_input_is_valid()
:
pg_input_is_valid ( string text, type text ) → boolean
Tests whether the given string is valid input for the specified data type, returning true or false. This function will only work as desired if the data type's input function has been updated to report invalid input as a “soft” error. Otherwise, invalid input will abort the transaction, just as if the string had been cast to the type directly.pg_input_is_valid('42', 'integer') → t pg_input_is_valid('42000000000', 'integer') → f pg_input_is_valid('1234.567', 'numeric(7,4)') → f
In PostgreSQL 15 and earlier, you can build your own:
create or replace function is_interpretable_as(arg text, arg_type text)
returns boolean language plpgsql as $$
begin
execute format('select cast(%L as %s)', arg, arg_type);
return true;
exception when others then
return false;
end $$;
Note that this will return false
regardless of whether the argument or the type is invalid, or both.
Upvotes: 0
Reputation: 3389
Would this work for you?
^-?[0-9]+\.?[0-9]*$
I'm asuming -0.07.5
is invalid (double dot present).
D04
will return false as well.
The problem in your original regex is that you're not escaping the dot, so it will match any characher, including D
in your D04
.
Hope it helps.
Upvotes: 17