Reputation: 562
I have been given the task of importing data from a 3rd party. Not knowing what the data would look like, I imported it into a JSON field in a PostgreSQL table. Now to make things easier, I want to import the data into separate tables. I have created a dbfiddle that shows the schema I want to import from/to and the JSON I want to import.
https://www.db-fiddle.com/f/gYWUqVWvwJ7v4NtJCYhkvA/4
Here is a query I am using
SELECT
"Id"::integer as id, "TotalAmt" as total_amt, "CreateTime" as create_time,
"InvoiceRef"::integer as invoice_ref, "CustomerRef"::integer as customer_ref,
("CustomFields"->0->>'Value')::integer as order_number
FROM
jsonb_to_recordset((
SELECT data->'Payments'
FROM json_imports
WHERE id = 7
)) x("Id" text, "TotalAmt" Decimal, "CreateTime" TIMESTAMP WITH TIME ZONE, "InvoiceRef" text, "CustomerRef" text, "CustomFields" jsonb)
Here is the explain plan copied from PgAdmin4
"Function Scan on jsonb_to_recordset x (cost=8.17..11.67 rows=100 width=56) (actual time=14.781..14.952 rows=333 loops=1)"
" InitPlan 1 (returns $0)"
" -> Index Scan using json_imports_pkey on json_imports (cost=0.15..8.17 rows=1 width=32) (actual time=13.561..14.034 rows=1 loops=1)"
" Index Cond: (id = 7)"
"Planning Time: 0.161 ms"
"Execution Time: 15.040 ms"
I am wondering if the sub-select is necessary. I previously tried to do something similar.
SELECT *
FROM json_imports
CROSS JOIN LATERAL jsonb_to_recordset(data->'Payments') as x("Id" text, "TotalAmt" Decimal, "CreateTime" TIMESTAMP WITH TIME ZONE, "InvoiceRef" text, "CustomerRef" text, "CustomFields" jsonb)
where id = 7
Using the cross join lateral was taking much more time, and the explain plan includes a nested loop inner join, so I don't think that is what I wanted. Here is the explain plan for the cross join version.
"Nested Loop (cost=0.15..10.17 rows=100 width=257) (actual time=14.348..14.462 rows=333 loops=1)"
" -> Index Scan using json_imports_pkey on json_imports (cost=0.15..8.17 rows=1 width=89) (actual time=0.020..0.024 rows=1 loops=1)"
" Index Cond: (id = 7)"
" -> Function Scan on jsonb_to_recordset x (cost=0.01..1.00 rows=100 width=168) (actual time=14.321..14.350 rows=333 loops=1)"
"Planning Time: 0.104 ms"
"Execution Time: 15.107 ms"
It seems a little over-the-top to use a sub-select just to drill-down into the payments array. Is there a better or more idiomatic way to select the elements of a JSON array in PostgreSQL 15?
Update Erwin posted the following in his answer which prompted me to look at the queries in question a little closer.
You comment that the variant with a LATERAL join would "take forever". I doubt that. Unnesting a single array won't take long, and barely longer than the subquery variant.
I realized that the select *, in the second query is very different than the limited data I was selecting from data->'Payments' in the first query. There is a lot of data in the other top-level keys. Using select *
took 3 minutes, then didn't show any data in PgAdmin4. Changing it to select x.*
made the execution time similar to the first query. The first query took 0.092 seconds. After changing it to select x.*
, the 2nd query took 0.062 seconds.
Upvotes: 1
Views: 65
Reputation: 658472
You comment that the variant with a LATERAL
join would "take forever". I doubt that. Unnesting a single array won't take long, and barely longer than the subquery variant.
(Indeed, your question update shows just that.)
Either way, if you run this query on a somewhat regular basis, register the row type once:
CREATE TYPE payment_row AS ("Id" integer, "TotalAmt" Decimal, "CreateTime" timestamptz, "InvoiceRef" integer, "CustomerRef" integer, "CustomFields" jsonb);
And use jsonb_populate_recordset()
instead, so you don't have to spell out the column definition list every time:
SELECT *
FROM jsonb_populate_recordset(null::payment_row
, (SELECT data->'Payments' FROM json_imports WHERE id = 7)) p;
Or with a LATERAL
join. Almost as fast and less convoluted:
SELECT p.*
FROM json_imports j
, jsonb_populate_recordset(null::payment_row, j.data->'Payments') p
WHERE j.id = 7;
The comma in the FROM
clause is just short syntax for CROSS JOIN LATERAL
here.
Related:
Upvotes: 4
Reputation: 562
Thank you @AdrianKlaver for pointing me to JSON path queries. I have come up with the following alternative. It takes about the same amount of time as my previous solution and has a similar explain plan.
select
(jsonb_path_query(data, '$.Payments[*]')->>'Id')::integer id,
(jsonb_path_query(data, '$.Payments[*]')->>'TotalAmt')::decimal total_amt,
cast(jsonb_path_query(data, '$.Payments[*]')->>'CreateTime' as timestamp with time zone) create_time,
jsonb_path_query(data, '$.Payments[*]')->>'InvoiceRef' invoice_ref,
jsonb_path_query(data, '$.Payments[*]')->>'CustomerRef' customer_ref,
(jsonb_path_query(data, '$.Payments[*]')->'CustomFields'->0->>'Value')::integer order_number
from json_imports
where id = 7
Here is the explain plan
"Result (cost=0.15..78.17 rows=1000 width=112) (actual time=13.389..14.545 rows=333 loops=1)"
" -> ProjectSet (cost=0.15..13.18 rows=1000 width=32) (actual time=13.364..13.910 rows=333 loops=1)"
" -> Index Scan using json_imports_pkey on json_imports (cost=0.15..8.17 rows=1 width=32) (actual time=0.020..0.025 rows=1 loops=1)"
" Index Cond: (id = 7)"
"Planning Time: 0.144 ms"
"Execution Time: 14.602 ms"
Upvotes: 0