Reputation: 13616
I have a text column that contains JSON and also plan text. I want to convert it to JSON, and then select a particular property. For example:
user_data
_________
{"user": {"name": "jim"}}
{"user": {"name": "sally"}}
some random data string
I've tried:
select user_data::json#>'{user,name}' from users
I get:
ERROR: invalid input syntax for type json
DETAIL: Token "some" is invalid.
CONTEXT: JSON user_data, line 1: some...
Is it possible to prevent this?
Upvotes: 42
Views: 149517
Reputation: 679
For me, it worked when I filtered the rows.
SELECT id, column_name
FROM schema.table_name
WHERE column_name IS NOT NULL
AND (column_name !~ '^\s*\{.*\}\s*$' OR column_name !~ '^\s*\[.*\]\s*$');
Then I got the rows where the data is invalid. I deleted those rows.
delete from schema.table_name where id ='553';
Then I converted the column type to jsonb from varchar.
EDIT: My use-case was converting a varchar column to json type. There was some data which was preventing the conversion (alter table statement)
Upvotes: 0
Reputation: 1237
Here's another solution which has a better performance:
CREATE OR REPLACE FUNCTION is_json(inp text)
RETURNS boolean LANGUAGE 'plperl' IMMUTABLE
AS $BODY$
# use JSON; # you need to preload this or use plperlu
my ($in) = @_;
my $chr = substr($in, 0, 1);
return 'f' if $chr ne '[' && $chr ne '{';
eval {
decode_json($in);
};
return $@ ? 'f' : 't';
$BODY$;
But of course you need to be familiar with plplerl to use this version.
If I run the query:
explain analyze select is_json(val) from mytable
this version has an execution time of about 7 ms as opposed to klin's plpgsql function which runs for about 12ms. The table had 3542 rows at the moment of testing. Plpgsql's exception handling blocks are costly to use, so avoid them when you can.
Upvotes: 0
Reputation: 4661
My solution is to do a preselect for solving this issue.
Let's say we got a given table users
with columns id
and user_data
. user_data
is declared as a text field, it should contain {firstName, lastName}
usually.
user_data
(while the result set would only contain the id
field in such a row)user_data ~ '^{.*}$'
("curly brace check") checks if JSON in contained in the user_data
field. Danger! Bear in mind that this check is incomplete. Incorrect JSON such as '{"key1": "value1", "key2"}' is still recognised as validGives as an example:
WITH users_json_data AS (
SELECT
id,
CASE WHEN user_data ~ '^{.*}$' THEN user_data::json END AS user_json
FROM users
)
SELECT id, users_json_data.user_json->>firstName, users_json_data.user_json->>lastName
FROM users
INNER JOIN users_json_data USING (id)
WHERE users_json_data.user_json IS NOT NULL
Upvotes: 0
Reputation: 470
Others have already suggested ways to check if the JSON is valid, but I feel that rather than check that, why not already cast it? I use this function (JSONB, but you can easily change it to JSON):
CREATE OR REPLACE FUNCTION safe_cast_to_jsonb(input TEXT) RETURNS JSONB AS
$$
DECLARE
output JSONB DEFAULT NULL;
BEGIN
BEGIN
output := input::JSONB;
EXCEPTION
WHEN OTHERS THEN RAISE NOTICE 'INVALID JSONB';
RETURN NULL;
END;
RETURN output;
END;
$$ LANGUAGE plpgsql;
Upvotes: 3
Reputation: 121494
Use this function:
create or replace function is_json(text)
returns boolean language plpgsql immutable as $$
begin
perform $1::json;
return true;
exception
when invalid_text_representation then
return false;
end $$;
Test:
with users(user_data) as (
values
('{"user": {"name": "jim"}}'),
('not json'),
('{"user": {"name": "sally"}}'),
('also not json')
)
select user_data::json#>'{user,name}' as name
from users
where is_json(user_data);
name
---------
"jim"
"sally"
(2 rows)
Upvotes: 13
Reputation: 3909
If you want to skip the rows with invalid JSON, you must first test if the text is valid JSON. You can do this by creating a function which will attempt to parse the value, and catch the exception for invalid JSON values.
CREATE OR REPLACE FUNCTION is_json(input_text varchar) RETURNS boolean AS $$
DECLARE
maybe_json json;
BEGIN
BEGIN
maybe_json := input_text;
EXCEPTION WHEN others THEN
RETURN FALSE;
END;
RETURN TRUE;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
When you have that, you could use the is_json
function in a CASE
or WHERE
clause to narrow down the valid values.
-- this can eliminate invalid values
SELECT user_data::json #> '{user,name}'
FROM users WHERE is_json(user_data);
-- or this if you want to fill will NULLs
SELECT
CASE
WHEN is_json(user_data)
THEN user_data::json #> '{user,name}'
ELSE
NULL
END
FROM users;
Upvotes: 20