Reputation: 103803
I've got a big database with analytics data written in JSON.
I want to filter out rows with incorrect data:
'{"hello": "world'
'{"products": [1,2,3]}'
and will leave out the '{"products": 1}'
I want to do something like:
select *
from analytics
where (is_correct_json(json::json))
and (is_array(json::json->>'products'))
How can I achieve that?
Upvotes: 34
Views: 33068
Reputation: 657202
Postgres 16 (release date late 2023) finally implements the SQL/JSON standard predicate IS JSON
for this purpose exactly. The manual:
This predicate tests whether
expression
can be parsed as JSON, possibly of a specified type. IfSCALAR
orARRAY
orOBJECT
is specified, the test is whether or not the JSON is of that particular type. IfWITH UNIQUE KEYS
is specified, then any object in theexpression
is also tested to see if it has duplicate keys.
To verify the string is a valid JSON literal:
... WHERE string_column IS JSON
To verify the string is a valid JSON array literal:
... WHERE string_column IS JSON ARRAY
(But note that '{"products": [1,2,3]}'
is not a JSON array, but a JSON object.)
This is an improved version of a_horse_with_no_name's function, still using an expensive subtransaction to trap the invalid_text_representation
error. But it's supposed to be faster and safer:
CREATE OR REPLACE FUNCTION f_is_json(_txt text)
RETURNS bool
LANGUAGE plpgsql IMMUTABLE STRICT AS
$func$
BEGIN
RETURN _txt::json IS NOT NULL;
EXCEPTION
WHEN SQLSTATE '22P02' THEN -- invalid_text_representation
RETURN false;
END
$func$;
COMMENT ON FUNCTION f_is_json(text) IS 'Test if input text is valid JSON.
Returns true, false, or NULL on NULL input.'
Mark the function STRICT
to get faster results for null
input.
The first item also makes it functionally different, returning null
on null
input - which is literally what the STRICT
modifier does, RETURNS NULL ON NULL INPUT
being alternative syntax for the same. This seems more reasonable, as null
is valid for the types json
and jsonb
(or any data type in Postgres for that matter), and we can test for true
/ false
/ null
as needed.
Sadly, we cannot mark the function PARALLEL SAFE
in Postgres 9.6 or later. The EXCEPTION
part makes Postgres start a subtransaction when the function is entered, and that's not compatible with parallel execution. Related:
This function only traps "invalid text representation" and not any other errors - even if other errors seem hardly possible.
"f_is_json" is just my naming convention to prefix my custom functions with "f_" to clearly distinguish from built-in functions.
Quick test:
SELECT f_is_json('[]') AS expected_t
, f_is_json('sdf') AS expected_f
, f_is_json(null) AS expected_n;
Upvotes: 23
Reputation:
This is another good example why choosing the appropriate data type right from the start helps later ;)
There is no built-in function to check if a given text is valid JSON. You can however write your own:
create or replace function is_valid_json(p_json text)
returns boolean
as
$$
begin
return (p_json::json is not null);
exception
when others then
return false;
end;
$$
language plpgsql
immutable;
Caution: due to the exception handling this is not going to be fast. If you call that on many invalid values this is going to slow down your select massively.
However both '{"products": 1}'
and '{"products": [1,2,3]}'
are valid JSON documents. The fact that the former is invalid is based on your application logic, not on the JSON syntax.
To verify that you would need a similar function, that traps errors when calling json_array_length()
create or replace function is_valid_json_array(p_json text, p_element text)
returns boolean
as
$$
begin
return json_array_length( p_json::json -> p_element) >= 0;
exception
when others then
return false;
end;
$$
language plpgsql
immutable;
Upvotes: 44