Reputation: 13417
I have the following query which is extracting several columns of data out of a JSON object (facebook_results
Postgres 10 column of json
datatype).
Sometimes the arrays in this object contain 10,000+ items.
The goal of this is to get a flat map of denormalized data from every column in the object, and where there's an array I want to get all the columns with the objects contained within there too (and obviously just duplicating the data down for the outer keys).
None of the innermost keys contain arrays, so I don't need to worry about that. I only care about the matches
and nodes
arrays which should be 'expanded'.
Right now the query works but it's extremely, extremely slow. I'm assuming because it's because of a poorly written query performing that is recursive or has unnecessary complexity slowdowns.
SELECT
id AS slice_id,
json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table' -> 'size' AS match_size,
json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table' -> 'score' AS match_score,
json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table' -> 'width' AS match_width,
json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table' -> 'format' AS match_format,
json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table' -> 'domain' AS match_domain,
json_array_elements(json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table' -> 'nodes') -> 'table' -> 'crawl_date' AS node_crawl_date,
json_array_elements(json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table' -> 'nodes') -> 'table' -> 'url' AS node_url
FROM slices
WHERE id = 169
Here's an example of what's contained in the facebook_results
column:
{
"table":{
"matches": [
{
"table":{
"nodes":[
{
"table":{
"crawl_date":"2013-06-21",
"url":"http://example.com"
}
}
],
"size":7962624,
"score":47.059,
"width":3456,
"format":"MP4",
"domain":"example.com"
}
}
]
}
}
Does anyone have an idea how I could optimize this?
Upvotes: 1
Views: 551
Reputation: 175706
You could rewrite your query using LATERAL
:
SELECT
id AS slice_id,
s.t -> 'size' AS match_size,
s.t -> 'score' AS match_score,
s.t -> 'width' AS match_width,
s.t -> 'format' AS match_format,
s.t -> 'domain' AS match_domain,
s.t2-> 'crawl_date' AS node_crawl_date,
s.t2-> 'url' AS node_url
FROM slices
,LATERAL (
SELECT json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table',
json_array_elements(json_array_elements(facebook_results -> 'table' -> 'matches')
-> 'table' -> 'nodes') -> 'table') s(t,t2)
WHERE id = 169;
Or even shorter:
SELECT
id AS slice_id,
s.t -> 'size' AS match_size,
s.t -> 'score' AS match_score,
s.t -> 'width' AS match_width,
s.t -> 'format' AS match_format,
s.t -> 'domain' AS match_domain,
s2.t2 -> 'crawl_date' AS node_crawl_date,
s2.t2 -> 'url' AS node_url
FROM slices
,LATERAL(SELECT
json_array_elements(facebook_results -> 'table' -> 'matches') -> 'table' ) s(t)
,LATERAL(SELECT json_array_elements(s.t -> 'nodes') -> 'table') s2(t2)
WHERE id = 169;
Upvotes: 4