Atul Jalan
Atul Jalan

Reputation: 23

Add key/value pair to all objects inside a jsonb array in POSTGRES if key does not already exist

I have a table, let's call it my_table with the following structure

ID   | data
____________
uuid | jsonb

The data in the jsonb field is an object structured in the following way:

{
  "A":[
    {"index":"1", "index2":"50"},
    {"index":"2"},
    {"index":"3"},
    {"index":"4", "index2":"10"},
  ],
  "B": "something",
  "C": "something_else",
}

I want to add an "index2" key to each object within "A" with a value of null, but only if that key does not already exist.

This is the result I want:

{
  "A":[
    {"index":"1", "index2":"50"},
    {"index":"2", "index2":null},
    {"index":"3", "index2":null},
    {"index":"4", "index2":"10"},
  ],
  "B": "something",
  "C": "something_else",
}

I have the following query that works but is extraordinarily slow. After running EXPLAIN on my query, it shows that for every row, it is scanning the entire table (essentially n^2 complexity).

Here's the current query:

UPDATE my_table
SET data = JSONB_SET(my_table.data, '{A}', (
    SELECT JSONB_AGG( element || '{"index2": null}' )
    FROM JSONB_ARRAY_ELEMENTS(my_table.data -> 'A') element
))
FROM my_table as my_table_2, JSONB_ARRAY_ELEMENTS(
    my_table_2.data -> 'A'
) as element
WHERE jsonb_array_length(my_table.data -> 'A') > 0
AND element#>'{index2}' IS NULL;

How can I speed this up to get to a linear run time?

Upvotes: 0

Views: 845

Answers (1)

juriad
juriad

Reputation: 44

You don't need the FROM which causes a CROSS JOIN and the quadratic behavior.

I simulated it locally and after simplification, this works fast:

explain analyse
UPDATE my_table
SET data = JSONB_SET(data, '{A}',
                     (SELECT JSONB_AGG('{"index2": null}' || element)
                      FROM JSONB_ARRAY_ELEMENTS(data -> 'A') element))
WHERE data @? '$.A[*] ? (!exists(@.index2))';

With a linear plan:

Update on my_table  (cost=0.00..31308.89 rows=8801 width=42) (actual time=1156.675..1156.677 rows=0 loops=1)
  ->  Seq Scan on my_table  (cost=0.00..31308.89 rows=8801 width=42) (actual time=0.128..865.131 rows=100000 loops=1)
"        Filter: (data @? '$.""A""[*]?(!(exists (@.""index2"")))'::jsonpath)"
        Rows Removed by Filter: 100000
        SubPlan 1
          ->  Aggregate  (cost=1.51..1.52 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=100000)
                ->  Function Scan on jsonb_array_elements element  (cost=0.01..1.00 rows=100 width=32) (actual time=0.001..0.001 rows=4 loops=100000)
Planning Time: 0.268 ms
Execution Time: 1156.836 ms

As you can see I tested it with 200000 rows out of which half needed an update.

Note that you had the operands of || switched. Also, my solution works with Postgres 12+, version 11 did not support jsonpath. Which version do you use?


Reply to comment about Postgres 10:

For older versions, I did not manage to express the filter directly in the condition. It looks like you can define a function instead:

create or replace function misses_index2(data jsonb) returns boolean
    language sql
as
$$
select count(*) filter ( where element #> '{index2}' IS NULL ) > 0
from JSONB_ARRAY_ELEMENTS(data -> 'A') element
$$;

(Tested on Postgres 13 but should work even on Postgres 10.)

Then the where condition is simply:

WHERE misses_index2(data);

The query is about twice as slow but that is expected.

Update on my_table  (cost=0.00..161007.00 rows=66990 width=42) (actual time=2413.235..2413.236 rows=0 loops=1)
  ->  Seq Scan on my_table  (cost=0.00..161007.00 rows=66990 width=42) (actual time=0.351..2123.757 rows=100000 loops=1)
        Filter: misses_index2(data)
        Rows Removed by Filter: 100000
        SubPlan 1
          ->  Aggregate  (cost=1.51..1.52 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=100000)
                ->  Function Scan on jsonb_array_elements element  (cost=0.01..1.00 rows=100 width=32) (actual time=0.001..0.001 rows=4 loops=100000)
Planning Time: 0.303 ms
Execution Time: 2413.347 ms

Upvotes: 1

Related Questions