Tallboy
Tallboy

Reputation: 13417

Postgres: How to optimize this query which uses multiple json_array_elements() calls

I have the following query which is extracting several columns of data out of a JSON object (facebook_results Postgres 10 column of json datatype).

Sometimes the arrays in this object contain 10,000+ items.

The goal of this is to get a flat map of denormalized data from every column in the object, and where there's an array I want to get all the columns with the objects contained within there too (and obviously just duplicating the data down for the outer keys).

None of the innermost keys contain arrays, so I don't need to worry about that. I only care about the matches and nodes arrays which should be 'expanded'.

Right now the query works but it's extremely, extremely slow. I'm assuming because it's because of a poorly written query performing that is recursive or has unnecessary complexity slowdowns.

SELECT
  id AS slice_id,
  json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table' -> 'size'       AS match_size,
  json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table' -> 'score'      AS match_score,
  json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table' -> 'width'      AS match_width,
  json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table' -> 'format'     AS match_format,
  json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table' -> 'domain'     AS match_domain,
  json_array_elements(json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table' -> 'nodes') -> 'table' -> 'crawl_date' AS node_crawl_date,
  json_array_elements(json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table' -> 'nodes') -> 'table' -> 'url'        AS node_url
FROM slices
WHERE id = 169

Here's an example of what's contained in the facebook_results column:

{
  "table":{
    "matches": [
      {  
        "table":{  
          "nodes":[  
            {  
              "table":{  
                "crawl_date":"2013-06-21",
                "url":"http://example.com"
              }
            }
          ],
          "size":7962624,
          "score":47.059,
          "width":3456,
          "format":"MP4",
          "domain":"example.com"
        }
      }
    ]
  }
}

Does anyone have an idea how I could optimize this?

Upvotes: 1

Views: 551

Answers (1)

Lukasz Szozda
Lukasz Szozda

Reputation: 175706

You could rewrite your query using LATERAL:

SELECT
  id AS slice_id,
  s.t -> 'size'       AS match_size,
  s.t -> 'score'      AS match_score,
  s.t -> 'width'      AS match_width,
  s.t -> 'format'     AS match_format,
  s.t -> 'domain'     AS match_domain,
  s.t2-> 'crawl_date' AS node_crawl_date,
  s.t2-> 'url'        AS node_url
FROM slices
,LATERAL (
SELECT json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table',
json_array_elements(json_array_elements(facebook_results -> 'table' -> 'matches') 
           -> 'table' -> 'nodes') -> 'table') s(t,t2)
WHERE id = 169;

DBFiddle Demo

Or even shorter:

SELECT
  id AS slice_id,
  s.t   -> 'size'       AS match_size,
  s.t   -> 'score'      AS match_score,
  s.t   -> 'width'      AS match_width,
  s.t   -> 'format'     AS match_format,
  s.t   -> 'domain'     AS match_domain,
  s2.t2 -> 'crawl_date' AS node_crawl_date,
  s2.t2 -> 'url'        AS node_url
FROM slices
,LATERAL(SELECT 
  json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table' ) s(t)
,LATERAL(SELECT json_array_elements(s.t -> 'nodes') -> 'table') s2(t2)
WHERE id = 169;

DBFiddle Demo2

Upvotes: 4

Related Questions