Jakub Troszok
Jakub Troszok

Reputation: 103803

How to verify a string is valid JSON in PostgreSQL?

I've got a big database with analytics data written in JSON.

I want to filter out rows with incorrect data:

I want to do something like:

select * 
from analytics 
where (is_correct_json(json::json)) 
and (is_array(json::json->>'products'))

How can I achieve that?

Upvotes: 34

Views: 33068

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657202

Postgres 16 or newer

Postgres 16 (release date late 2023) finally implements the SQL/JSON standard predicate IS JSON for this purpose exactly. The manual:

This predicate tests whether expression can be parsed as JSON, possibly of a specified type. If SCALAR or ARRAY or OBJECT is specified, the test is whether or not the JSON is of that particular type. If WITH UNIQUE KEYS is specified, then any object in the expression is also tested to see if it has duplicate keys.

To verify the string is a valid JSON literal:

... WHERE string_column IS JSON

To verify the string is a valid JSON array literal:

... WHERE string_column IS JSON ARRAY

(But note that '{"products": [1,2,3]}' is not a JSON array, but a JSON object.)

Postgres 15 or older

This is an improved version of a_horse_with_no_name's function, still using an expensive subtransaction to trap the invalid_text_representation error. But it's supposed to be faster and safer:

CREATE OR REPLACE FUNCTION f_is_json(_txt text)
  RETURNS bool
  LANGUAGE plpgsql IMMUTABLE STRICT AS
$func$
BEGIN
   RETURN _txt::json IS NOT NULL;
EXCEPTION
   WHEN SQLSTATE '22P02' THEN  -- invalid_text_representation
      RETURN false;
END
$func$;

COMMENT ON FUNCTION f_is_json(text) IS 'Test if input text is valid JSON.
Returns true, false, or NULL on NULL input.'

Major differences

  • Mark the function STRICT to get faster results for null input.

  • The first item also makes it functionally different, returning null on null input - which is literally what the STRICT modifier does, RETURNS NULL ON NULL INPUT being alternative syntax for the same. This seems more reasonable, as null is valid for the types json and jsonb (or any data type in Postgres for that matter), and we can test for true / false / null as needed.

  • Sadly, we cannot mark the function PARALLEL SAFE in Postgres 9.6 or later. The EXCEPTION part makes Postgres start a subtransaction when the function is entered, and that's not compatible with parallel execution. Related:

  • This function only traps "invalid text representation" and not any other errors - even if other errors seem hardly possible.

"f_is_json" is just my naming convention to prefix my custom functions with "f_" to clearly distinguish from built-in functions.

Quick test:

SELECT f_is_json('[]')  AS expected_t
     , f_is_json('sdf') AS expected_f
     , f_is_json(null)  AS expected_n;

fiddle

Upvotes: 23

user330315
user330315

Reputation:

This is another good example why choosing the appropriate data type right from the start helps later ;)

There is no built-in function to check if a given text is valid JSON. You can however write your own:

create or replace function is_valid_json(p_json text)
  returns boolean
as
$$
begin
  return (p_json::json is not null);
exception 
  when others then
     return false;  
end;
$$
language plpgsql
immutable;

Caution: due to the exception handling this is not going to be fast. If you call that on many invalid values this is going to slow down your select massively.

However both '{"products": 1}' and '{"products": [1,2,3]}' are valid JSON documents. The fact that the former is invalid is based on your application logic, not on the JSON syntax.

To verify that you would need a similar function, that traps errors when calling json_array_length()

create or replace function is_valid_json_array(p_json text, p_element text)
  returns boolean
as
$$
begin
  return json_array_length( p_json::json -> p_element) >= 0;
exception 
  when others then
     return false;  
end;
$$
language plpgsql
immutable;

Upvotes: 44

Related Questions