Apprentice Programmer
Apprentice Programmer

Reputation: 1515

How to update a large (1 million+ rows) postgres column of jsonb type values

Trying to update a specific array inside of a jsonb type in a column called params , and having issues with how long it’s taking. For example, there is a table with a row that contains an array owners

{
  "hidden": false,
  "org_id": "34692",
  "owners": [
    "[email protected]"
  ],
  "deleted": false
}

And another example

{
  "hidden": false,
  "org_id": "34692",
  "owners": [
    "[email protected]"
  ],
  "deleted": false
}

And there’s essentially a million of these rows (all with different email domains as owners. I have this query which I want to execute across all of these rows:

UPDATE table 
  set params = CASE 
                 WHEN params->>'owners' NOT LIKE '%google.com%' THEN jsonb_set(params, '{owners}', concat('"', substr(md5(random()::text), 0, 25), '@googlefake.com"')::jsonb) 
                 ELSE params 
               END

I’ve tested with a dataset of 100, and it executes perfectly time, but doing this with a 1000x multiple, makes the query forever execute, and I’ve no clue if it will actually successfully complete. Not entirely sure how to speed up this process or utilize this in a better fashion. I did try indexing e.g. CREATE INDEX ON table((params->>'owners')); to no avail. Query has run >1 hour, and there are multiple rows similar to this.

Am i indexing incorrectly? Also, I've looked into the gin operator and @> won't help since each owner field differs

Update:

UPDATE table AS "updatetarget" 
                 SET params = jsonb_set(params, '{owners}', concat('"', substr(md5(random()::text), 0, 25), '@googlefake.com"')::jsonb) 

query updated to this and still taking awhile. Is there some way to index the key so i can make the second query faster?

Upvotes: 1

Views: 881

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 247215

Avoid unnecessary updates with a WHERE clause that filters out the rows that don't need to be modified.

UPDATE table 
  set params = jsonb_set(
                  params,
                  '{owners}',
                  concat(
                     '"',
                     substr(md5(random()::text), 0, 25),
                     '@googlefake.com"'
                  )::jsonb
               ) 
WHERE params->>'owners' NOT LIKE '%google.com%';

If a lot of rows are affected, you may want to run VACUUM (FULL) once the update is done.

Upvotes: 1

Related Questions