Reputation: 23
I have a table, let's call it my_table with the following structure
ID | data
____________
uuid | jsonb
The data in the jsonb field is an object structured in the following way:
{
"A":[
{"index":"1", "index2":"50"},
{"index":"2"},
{"index":"3"},
{"index":"4", "index2":"10"},
],
"B": "something",
"C": "something_else",
}
I want to add an "index2" key to each object within "A" with a value of null
, but only if that key does not already exist.
This is the result I want:
{
"A":[
{"index":"1", "index2":"50"},
{"index":"2", "index2":null},
{"index":"3", "index2":null},
{"index":"4", "index2":"10"},
],
"B": "something",
"C": "something_else",
}
I have the following query that works but is extraordinarily slow. After running EXPLAIN
on my query, it shows that for every row, it is scanning the entire table (essentially n^2 complexity).
Here's the current query:
UPDATE my_table
SET data = JSONB_SET(my_table.data, '{A}', (
SELECT JSONB_AGG( element || '{"index2": null}' )
FROM JSONB_ARRAY_ELEMENTS(my_table.data -> 'A') element
))
FROM my_table as my_table_2, JSONB_ARRAY_ELEMENTS(
my_table_2.data -> 'A'
) as element
WHERE jsonb_array_length(my_table.data -> 'A') > 0
AND element#>'{index2}' IS NULL;
How can I speed this up to get to a linear run time?
Upvotes: 0
Views: 845
Reputation: 44
You don't need the FROM which causes a CROSS JOIN and the quadratic behavior.
I simulated it locally and after simplification, this works fast:
explain analyse
UPDATE my_table
SET data = JSONB_SET(data, '{A}',
(SELECT JSONB_AGG('{"index2": null}' || element)
FROM JSONB_ARRAY_ELEMENTS(data -> 'A') element))
WHERE data @? '$.A[*] ? (!exists(@.index2))';
With a linear plan:
Update on my_table (cost=0.00..31308.89 rows=8801 width=42) (actual time=1156.675..1156.677 rows=0 loops=1)
-> Seq Scan on my_table (cost=0.00..31308.89 rows=8801 width=42) (actual time=0.128..865.131 rows=100000 loops=1)
" Filter: (data @? '$.""A""[*]?(!(exists (@.""index2"")))'::jsonpath)"
Rows Removed by Filter: 100000
SubPlan 1
-> Aggregate (cost=1.51..1.52 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=100000)
-> Function Scan on jsonb_array_elements element (cost=0.01..1.00 rows=100 width=32) (actual time=0.001..0.001 rows=4 loops=100000)
Planning Time: 0.268 ms
Execution Time: 1156.836 ms
As you can see I tested it with 200000 rows out of which half needed an update.
Note that you had the operands of ||
switched. Also, my solution works with Postgres 12+, version 11 did not support jsonpath. Which version do you use?
Reply to comment about Postgres 10:
For older versions, I did not manage to express the filter directly in the condition. It looks like you can define a function instead:
create or replace function misses_index2(data jsonb) returns boolean
language sql
as
$$
select count(*) filter ( where element #> '{index2}' IS NULL ) > 0
from JSONB_ARRAY_ELEMENTS(data -> 'A') element
$$;
(Tested on Postgres 13 but should work even on Postgres 10.)
Then the where condition is simply:
WHERE misses_index2(data);
The query is about twice as slow but that is expected.
Update on my_table (cost=0.00..161007.00 rows=66990 width=42) (actual time=2413.235..2413.236 rows=0 loops=1)
-> Seq Scan on my_table (cost=0.00..161007.00 rows=66990 width=42) (actual time=0.351..2123.757 rows=100000 loops=1)
Filter: misses_index2(data)
Rows Removed by Filter: 100000
SubPlan 1
-> Aggregate (cost=1.51..1.52 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=100000)
-> Function Scan on jsonb_array_elements element (cost=0.01..1.00 rows=100 width=32) (actual time=0.001..0.001 rows=4 loops=100000)
Planning Time: 0.303 ms
Execution Time: 2413.347 ms
Upvotes: 1