user2453676
user2453676

Reputation: 562

Is sub-select necessary when using jsonb_to_recordset

I have been given the task of importing data from a 3rd party. Not knowing what the data would look like, I imported it into a JSON field in a PostgreSQL table. Now to make things easier, I want to import the data into separate tables. I have created a dbfiddle that shows the schema I want to import from/to and the JSON I want to import.

https://www.db-fiddle.com/f/gYWUqVWvwJ7v4NtJCYhkvA/4

Here is a query I am using

SELECT
    "Id"::integer as id, "TotalAmt" as total_amt, "CreateTime" as create_time,
    "InvoiceRef"::integer as invoice_ref, "CustomerRef"::integer as customer_ref,
    ("CustomFields"->0->>'Value')::integer as order_number
FROM
jsonb_to_recordset((
    SELECT data->'Payments'
      FROM json_imports
      WHERE id = 7
)) x("Id" text, "TotalAmt" Decimal, "CreateTime" TIMESTAMP WITH TIME ZONE, "InvoiceRef" text, "CustomerRef" text, "CustomFields" jsonb)

Here is the explain plan copied from PgAdmin4

"Function Scan on jsonb_to_recordset x  (cost=8.17..11.67 rows=100 width=56) (actual time=14.781..14.952 rows=333 loops=1)"
"  InitPlan 1 (returns $0)"
"    ->  Index Scan using json_imports_pkey on json_imports  (cost=0.15..8.17 rows=1 width=32) (actual time=13.561..14.034 rows=1 loops=1)"
"          Index Cond: (id = 7)"
"Planning Time: 0.161 ms"
"Execution Time: 15.040 ms"

I am wondering if the sub-select is necessary. I previously tried to do something similar.

SELECT *
FROM json_imports
CROSS JOIN LATERAL jsonb_to_recordset(data->'Payments') as x("Id" text, "TotalAmt" Decimal, "CreateTime" TIMESTAMP WITH TIME ZONE, "InvoiceRef" text, "CustomerRef" text, "CustomFields" jsonb)
where id = 7

Using the cross join lateral was taking much more time, and the explain plan includes a nested loop inner join, so I don't think that is what I wanted. Here is the explain plan for the cross join version.

"Nested Loop  (cost=0.15..10.17 rows=100 width=257) (actual time=14.348..14.462 rows=333 loops=1)"
"  ->  Index Scan using json_imports_pkey on json_imports  (cost=0.15..8.17 rows=1 width=89) (actual time=0.020..0.024 rows=1 loops=1)"
"        Index Cond: (id = 7)"
"  ->  Function Scan on jsonb_to_recordset x  (cost=0.01..1.00 rows=100 width=168) (actual time=14.321..14.350 rows=333 loops=1)"
"Planning Time: 0.104 ms"
"Execution Time: 15.107 ms"

It seems a little over-the-top to use a sub-select just to drill-down into the payments array. Is there a better or more idiomatic way to select the elements of a JSON array in PostgreSQL 15?

Update Erwin posted the following in his answer which prompted me to look at the queries in question a little closer.

You comment that the variant with a LATERAL join would "take forever". I doubt that. Unnesting a single array won't take long, and barely longer than the subquery variant.

I realized that the select *, in the second query is very different than the limited data I was selecting from data->'Payments' in the first query. There is a lot of data in the other top-level keys. Using select * took 3 minutes, then didn't show any data in PgAdmin4. Changing it to select x.* made the execution time similar to the first query. The first query took 0.092 seconds. After changing it to select x.*, the 2nd query took 0.062 seconds.

Upvotes: 1

Views: 65

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658472

You comment that the variant with a LATERAL join would "take forever". I doubt that. Unnesting a single array won't take long, and barely longer than the subquery variant.
(Indeed, your question update shows just that.)

Either way, if you run this query on a somewhat regular basis, register the row type once:

CREATE TYPE payment_row AS ("Id" integer, "TotalAmt" Decimal, "CreateTime" timestamptz, "InvoiceRef" integer, "CustomerRef" integer, "CustomFields" jsonb);

And use jsonb_populate_recordset() instead, so you don't have to spell out the column definition list every time:

SELECT *
FROM   jsonb_populate_recordset(null::payment_row
                              , (SELECT data->'Payments' FROM json_imports WHERE id = 7)) p;

Or with a LATERAL join. Almost as fast and less convoluted:

SELECT p.*
FROM   json_imports j
     , jsonb_populate_recordset(null::payment_row, j.data->'Payments') p
WHERE  j.id = 7;

The comma in the FROM clause is just short syntax for CROSS JOIN LATERAL here.

Related:

Upvotes: 4

user2453676
user2453676

Reputation: 562

Thank you @AdrianKlaver for pointing me to JSON path queries. I have come up with the following alternative. It takes about the same amount of time as my previous solution and has a similar explain plan.

select
    (jsonb_path_query(data, '$.Payments[*]')->>'Id')::integer id,
    (jsonb_path_query(data, '$.Payments[*]')->>'TotalAmt')::decimal total_amt,
    cast(jsonb_path_query(data, '$.Payments[*]')->>'CreateTime' as timestamp with time zone) create_time,
    jsonb_path_query(data, '$.Payments[*]')->>'InvoiceRef' invoice_ref,
    jsonb_path_query(data, '$.Payments[*]')->>'CustomerRef' customer_ref,
    (jsonb_path_query(data, '$.Payments[*]')->'CustomFields'->0->>'Value')::integer order_number
from json_imports
where id = 7

Here is the explain plan

"Result  (cost=0.15..78.17 rows=1000 width=112) (actual time=13.389..14.545 rows=333 loops=1)"
"  ->  ProjectSet  (cost=0.15..13.18 rows=1000 width=32) (actual time=13.364..13.910 rows=333 loops=1)"
"        ->  Index Scan using json_imports_pkey on json_imports  (cost=0.15..8.17 rows=1 width=32) (actual time=0.020..0.025 rows=1 loops=1)"
"              Index Cond: (id = 7)"
"Planning Time: 0.144 ms"
"Execution Time: 14.602 ms"

Upvotes: 0

Related Questions