Reputation: 3
I have a Postgres query that returns values for the following. This query returns one data for the id.
select h ->> 'dw:address_line_1' as address, h ->> 'dw:city' as city, h -> 'dw:region' -> 'dw:ID' -> 2 -> '#text' as state, h ->'dw:Country_ISO_Code' as country
from testschema.dw_data_job t, json_array_elements(t.addresses::json) h
where t.id = '12345'
and h -> 'dw:address_type' -> 'dw:ID' -> 1 ->> '#text' = 'BUSINESS'
Now I want to get BUSINESS address for entire population in the table dw_data_job
. I changed query to as follows.
select h ->> 'dw:address_line_1' as address, h ->> 'dw:city' as city, h -> 'dw:region' -> 'dw:ID' -> 2 -> '#text' as state, h ->'dw:Country_ISO_Code' as country
from testschema.dw_data_job t, testschema.dw_data_demo t1, json_array_elements(t.addresses::json) h
where t.id = t1.id
and h -> 'dw:address_type' -> 'dw:ID' -> 1 ->> '#text' = 'BUSINESS'
Now I get error as follows:
ERROR: cannot call json_array_elements on a non-array CONTEXT: parallel worker SQL state: 22023
The type of column dw_data_job.addresses
is text
and data looks like this - when the JSON value is an array:
[{"dw:address_line_1": "123 Inter Pkwy", "dw:city": "Richardson", "dw:region": {"@dw:Descriptor": "Texas", "dw:ID": [{"@dw:type": "WID", "#text": "fc77e3a1ab36487f9646d14f7242dd77"}, {"@dw:type": "Country_Region_ID", "#text": "USA-TX"}, {"@dw:type": "ISO_3166-2_Code", "#text": "TX"}]}, "dw:region_subdivision_1": "Dallas", "dw:postal_code": "75081", "dw:Country_ISO_Code": "USA", "dw:address_type": {"@dw:Descriptor": "Business", "dw:ID": [{"@dw:type": "WID", "#text": "4fae289a7fe541b098ca9448e462ff6b"}, {"@dw:type": "Communication_Usage_Type_ID", "#text": "BUSINESS"}]}, "dw:primary": "1"}, {"dw:address_line_1": "567 South Dr", "dw:city": "Plano", "dw:region": {"@dw:Descriptor": "Texas", "dw:ID": [{"@dw:type": "WID", "#text": "fc77e3a1ab36487f9646d14f7242dd77"}, {"@dw:type": "Country_Region_ID", "#text": "USA-TX"}, {"@dw:type": "ISO_3166-2_Code", "#text": "TX"}]}, "dw:region_subdivision_1": "Collin", "dw:postal_code": "75024", "dw:Country_ISO_Code": "USA", "dw:address_type": {"@dw:Descriptor": "Home", "dw:ID": [{"@dw:type": "WID", "#text": "836cf00ef5974ac08b786079866c946f"}, {"@dw:type": "Communication_Usage_Type_ID", "#text": "HOME"}]}, "dw:primary": "1"}, {"dw:address_line_1": "789 North Dr.", "dw:city": "Plano", "dw:region": {"@dw:Descriptor": "Texas", "dw:ID": [{"@dw:type": "WID", "#text": "fc77e3a1ab36487f9646d14f7242dd77"}, {"@dw:type": "Country_Region_ID", "#text": "USA-TX"}, {"@dw:type": "ISO_3166-2_Code", "#text": "TX"}]}, "dw:region_subdivision_1": "Collin", "dw:postal_code": "75024", "dw:Country_ISO_Code": "USA", "dw:address_type": {"@dw:Descriptor": "Home-Vac", "dw:ID": [{"@dw:type": "WID", "#text": "836cf00ef5974ac08b786079866c946f"}, {"@dw:type": "Communication_Usage_Type_ID", "#text": "HOME-VAC"}]}, "dw:communication_usage_behavior": {"@dw:Descriptor": "Mailing", "dw:ID": [{"@dw:type": "WID", "#text": "bea4505497c901ea53792e2628077617"}, {"@dw:type": "Communication_Usage_Behavior_Tenanted_ID", "#text": "MAILING"}]}, "dw:primary": "0"}]
Another sample value with a non-array (object):
{"dw:address_line_1": "123 Local Pkwy", "dw:city": "Cary", "dw:region": {"@dw:Descriptor": "North Carolina", "dw:ID": [{"@dw:type": "WID", "#text": "1486a0a4a8464c3b9ec482d4038deb99"}, {"@dw:type": "Country_Region_ID", "#text": "USA-NC"}, {"@dw:type": "ISO_3166-2_Code", "#text": "NC"}]}, "dw:region_subdivision_1": "Wake", "dw:postal_code": "27513", "dw:Country_ISO_Code": "USA", "dw:address_type": {"@dw:Descriptor": "Business", "dw:ID": [{"@dw:type": "WID", "#text": "4fae289a7fe541b098ca9448e462ff6b"}, {"@dw:type": "Communication_Usage_Type_ID", "#text": "BUSINESS"}]}, "dw:primary": "1"}
The query needs to accommodate both array and non-array json data to bring back address, city, state and country code for all rows of data for where the nested key #text
has the value 'BUSINESS'
.
Upvotes: 0
Views: 4755
Reputation: 656291
Like the error message tells you (and Jeff pointed out) there is one or more rows in the table dw_data_job
, where the column addresses
does not contain a valid JSON array (or NULL
). A valid JSON literal, yes, or we'd see a different error message raised by the failed cast to ::json
, but not a JSON array.
Run this query to identify violating rows:
SELECT id, addresses
, jsonb_pretty(addresses::jsonb) AS js_pretty -- optional
FROM dw_data_job
WHERE json_typeof(addresses::json) IS DISTINCT FROM 'array';
(Includes NULL values in addresses
, which would not raise the reported error, but may be a problem, too.)
To just skip rows with invalid data and proceed with your query:
SELECT h ->> 'dw:address_line_1' AS address
, h ->> 'dw:city' AS city
, h #>> '{dw:region,dw:ID,2,#text}' AS state
, h ->> 'dw:Country_ISO_Code' AS country
, json_typeof(t.addresses::json)
FROM testschema.dw_data_job t
JOIN testschema.dw_data_demo t1 USING (id)
CROSS JOIN json_array_elements(t.addresses::json) h
WHERE json_typeof(t.addresses::json) = 'array'
AND h #>> '{dw:address_type,dw:ID,1,#text}' = 'BUSINESS';
I simplified the syntax with the compact operator #>
for deeply nested values, and return all text
instead of some json
(as educated guess).
To include plain objects as well as arrays, you might:
WITH sel AS ( -- compute json & type
SELECT t.addresses::json AS a, json_typeof(t.addresses::json) AS js_type
FROM testschema.dw_data_job t
JOIN testschema.dw_data_demo t1 USING (id)
)
, obj AS ( -- unnest arrays and union with plain objects
SELECT json_array_elements(a) AS h FROM sel WHERE js_type = 'array'
UNION ALL
SELECT a FROM sel WHERE js_type = 'object'
-- all other types are ignored!
)
SELECT h ->> 'dw:address_line_1' AS address
, h ->> 'dw:city' AS city
, h #>> '{dw:region,dw:ID,2,#text}' AS state
, h ->> 'dw:Country_ISO_Code' AS country
FROM obj
WHERE h #>> '{dw:address_type,dw:ID,1,#text}' = 'BUSINESS';
db<>fiddle here
Upvotes: 2