alphadogg
alphadogg

Reputation: 12900

Recursive JSON generation in PostgreSQL

I have the following tables in a PostgreSQL 9.5 server: tables

The notable structural thing is that location is theoretically infinitely recursive. I need to generate a JSON message from a root location, recursing into all sub-locations; every location has some properties, an array of inventory items and an array of child locations.

How do I create a performant query for this? I am looking at various PostgreSQL JSON functions, the LATERAL keyword, CTEs, and getting a bit confused. I've done JSON output with non-recursive queries, but not sure how to handle recursion cleanly.

Here's an example output:

{
  "id": 1000,
  "name": "By Location",
  "type": "SITE",
  "locations": [
    {
      "id": 1005,
      "name": "Storage A",
      "type": "STOR",
      "locations": [ ...(same schema as parent)... ],
      "inventories": [ ...(see below for schema)... ]
    },
    {
      "id": 1017,
      "name": "Storage B",
      "name": "COLD",
      "locations": [ ...(same schema as parent)... ],
      "inventories": [...(see below for schema)... ]
    }
  ],
  "inventories": [
    {
      "id": 5340,
      "product_id": 9120,
      "name": "Product X",
      "thumb": "https://example.com/api/images/nnnn.jpg",
      "sort_order": 1,
      "par_level": 3.5,
      "created": 1452898800,
      "updated": 1453071600,
      "measures": [
        {"id": 3498, "quantity": 2.25, "created": 1453071600, "updated": 1453071600},
        {"id": 3456, "quantity": 3.25, "created": 1452898800, "updated": 1452898800}
      ]
    }
  ]
}

Upvotes: 4

Views: 1959

Answers (2)

alphadogg
alphadogg

Reputation: 12900

I ended up creating three functions. Probably could have done less, but the functions would be reusable in other queries. Basically, any place in the JSON output that should have an array of values, that gets handled by a function that returns a recordset that gets json_agg()'ed.

CREATE OR REPLACE FUNCTION get_measures_by_inventory_as_json(invid UUID, del TIMESTAMP WITH TIME ZONE DEFAULT now())
RETURNS TABLE(inventory_id UUID, measure_json JSON)
AS $$
  -- returns a JSONified record per measure tied to an inventory record
  SELECT m.inventory_id, json_build_object(
    'id', m.id, 
    'quantity', m.quantity,
    'read', TRUNC(EXTRACT(EPOCH FROM m.read_date)),
    'created', TRUNC(EXTRACT(EPOCH FROM m.created)), 
    'updated', TRUNC(EXTRACT(EPOCH FROM m.updated)),
    'deleted', TRUNC(EXTRACT(EPOCH FROM m.deleted))
  )
  FROM measure m
  WHERE m.inventory_id = invid
  AND (m.deleted >= del);
$$
LANGUAGE sql;



CREATE OR REPLACE FUNCTION get_inventories_by_location_as_json(locid UUID, del TIMESTAMP WITH TIME ZONE DEFAULT now())
RETURNS TABLE(location_id UUID, inventory_json JSON)
AS $$
  -- returns a JSONified set of inventory items, with product info and measures, given a location
  SELECT i.location_id, json_build_object(
    'id', i.id,
    'product_id', p.id,
    'name', p.name,
    'mass_quantity', p.mass_quantity,
    'mass_unit', um.code,
    'count_unit', uc.code,
    'thumb', p.product_picture_uri,
    'sort_order', i.sort_order,
    'par_level', i.par_level,
    'created', TRUNC(EXTRACT(EPOCH FROM i.created)),
    'updated', TRUNC(EXTRACT(EPOCH FROM i.updated)),
    'deleted', TRUNC(EXTRACT(EPOCH FROM i.deleted)),
    'measures', COALESCE((SELECT json_agg(measure_json) FROM get_measures_by_inventory_as_json(i.id)), '[]')::json
  )
  FROM inventory i 
  INNER JOIN product p ON i.product_id = p.id
  LEFT JOIN unit um ON p.mass_unit_id = um.id
  LEFT JOIN unit uc ON p.count_unit_id = uc.id
  WHERE i.location_id = locid
  AND i.deleted >= del
  AND p.deleted >= del;
$$
LANGUAGE sql;



CREATE OR REPLACE FUNCTION get_inventories_recursive_as_json(locid UUID[], del TIMESTAMP WITH TIME ZONE DEFAULT now())
RETURNS JSON
AS $$
  -- returns JSONified location info and inventories in that location
  -- and recurses into child locations, showing the same
  SELECT json_agg(loc) FROM (
    SELECT l.id, array_agg(c.id), json_build_object(
    'id', l.id,
    'name', l.name,
    'type', t.code,
    'locations', get_inventories_recursive_as_json(array_agg(c.id)),
    'inventories', COALESCE((SELECT json_agg(inventory_json) FROM get_inventories_by_location_as_json(l.id)),'[]')::json
    ) AS loc
    FROM location l
    LEFT OUTER JOIN location c ON l.id = c.parent_id
    INNER JOIN location_type t ON l.location_type_id = t.id
    WHERE l.id = ANY(locid)
    AND l.deleted >= del
    GROUP BY l.id, l.name, t.code
  ) AS out;
$$
LANGUAGE sql;

Tried to do it via a CTE, which would have been so elegant, but was unable to figure out how to do it without running afoul the errors related to the inability to aggregate in the recurse.

WITH RECURSIVE locations AS (
  WITH inventories AS (
    WITH measures AS (
     SELECT m.inventory_id, json_agg(json_build_object(
        'id', m.id, 
        'quantity', m.quantity,
        'read', TRUNC(EXTRACT(EPOCH FROM m.read_date)),
        'created', TRUNC(EXTRACT(EPOCH FROM m.created)), 
        'updated', TRUNC(EXTRACT(EPOCH FROM m.updated)),
        'deleted', TRUNC(EXTRACT(EPOCH FROM m.deleted))
      )) as measures
      FROM measure m
      GROUP BY m.inventory_id
    )
    SELECT i.location_id, json_agg(json_build_object(
      'id', i.id,
      'product_id', p.id,
      'name', p.name,
      'mass_quantity', p.mass_quantity,
      'mass_unit', um.code,
      'count_unit', uc.code,
      'thumb', p.product_picture_uri,
      'sort_order', i.sort_order,
      'par_level', i.par_level,
      'created', TRUNC(EXTRACT(EPOCH FROM i.created)),
      'updated', TRUNC(EXTRACT(EPOCH FROM i.updated)),
      'deleted', TRUNC(EXTRACT(EPOCH FROM i.deleted)),
      'measures', COALESCE(m.measures, '[]')
    )) AS inventories
    FROM inventory i 
    INNER JOIN product p ON i.product_id = p.id
    LEFT JOIN unit um ON p.mass_unit_id = um.id
    LEFT JOIN unit uc ON p.count_unit_id = uc.id
    LEFT JOIN measures m ON i.id = m.inventory_id 
    GROUP BY i.location_id
  )

  SELECT null as id, null as name, null as type, null as inventories
  FROM location l
  INNER JOIN location_type t ON l.location_type_id = t.id
  LEFT OUTER JOIN inventories i ON l.id = i.location_id
  GROUP BY l.parent_id

  /*UNION ALL

  SELECT p.id, p.parent_id, p.name, t.code, COALESCE(i.inventories, '[]')::jsonb AS inventories, json_agg(row_to_json(c.*))
  FROM location p
  INNER JOIN location_type t ON p.location_type_id = t.id
  LEFT OUTER JOIN inventories i ON p.id = i.location_id
  INNER JOIN locations c ON p.id = c.parent_id
  GROUP BY p.id, p.name, t.code, COALESCE(i.inventories, '[]')::jsonb*/

)
SELECT * FROM locations

Upvotes: 0

DavidNJ
DavidNJ

Reputation: 189

Let's break it into pieces. First, you will have nested subqueries to create the nested arrays. Common Table Expressions may help.

The other tricks are row_to_json and json_agg.

The first gotcha is that row_to_json needs the table as an argument to return the correct labels.

select json_agg(locations) from locations

Will return a json object for each row. To use only certain fields you will need to either create a type and cast to it or use a CTE and the syntax above. I'd use the CTE in most cases.

So you will end up with something like:

WITH lowlevel1 AS 
( 
       SELECT a, 
          b, 
          c 
       FROM   tab1) ,lowlevel2 AS 
( 
       SELECT b, 
          c, 
          d 
       FROM   tab2) ,midlevel1 AS 
( 
        SELECT          e, 
                f, 
                g, 
                json_agg(lowlevel1) AS lab1, 
                json_agg(lowlevel2) AS lab2 
        FROM            tab3 
        LEFT OUTER JOIN lowlevel1 
        ON              tab3.id = lowlevel1.parent 
        LEFT OUTER JOIN lowlevel2 
        ON              tab3.id = lovlevel2.parent)
SELECT row_to_json(midlevel1) from midlevel1

or on the last line use json_agg(midlevel1) instead row_to_json(midlevel1) to return one array of all rows.

CTEs also support regression with the RECURSIVE modifier. However, that returns one table with the results of the regression and not a nested JSON structure. As a result, you will probably need to explicitly code the desired levels of nesting.

If an element doesn't exist Postgres will return null. For example a list of sub-locations where there are none will return "locations":[null]. To replace that with a more meaningful result the case when <> then <> else '[]' end or if <> then <> else '[]' end can be used. The first is the 'searched case' where each test is a Boolean expression.

Upvotes: 1

Related Questions